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SECTION 1 - INTRODUCTION 


This Is the Final Engineering Report prepared for SRI International under 
Subcontract No. 14395 covering the Implementation of software fault tolerance 
for critical modules of the SIFT operating software. The SIFT (Software 
Implemented Fault Tolerance) Is an advanced computer concept developed by SRI 
for the NASA Langley Research Center under Contract NAS1-15428 to support the 
computational and reliability requirements of advanced fly-by-wIre transport 
aircraft. 

This report compiles with the requirements of Article IV Item D of SRI 
Subcontract No. 14395. 

Although this project constituted only a minor part of the SIFT effort, 
considerable advances In concepts and Implementation of software fault tolerance 
were achieve under It. These are summarized In the paragraphs Immediately 
following. Part 1.2 of the Introduction provides an overview of the specific 
modules for which fault tolerant designs were generated, the error reporter and 
the global executive. Part 1.3 describes the organization of the body of this 
report, and Part 1.4 acknowledges the contribution of Individuals outside our 
organization to this work. 


1.1 ADVANCES IN SOFTWARE FAULT TOLERANCE IN THIS EFFORT 


Because the software In the SIFT operating system Is essential for both 
scheduling of application tasks and recovery from hardware failures, special 
efforts have been made to verify this software In a formal manner. In addition. 
It Is being subjected to an extensive test program. Nevertheless, provision of 
fault tolerance features was deemed desirable for selected portions of these 
programs that have a key role In the recovery from failures. Note that this 
software must perform In accordance with Its specification In the presence of 
faults In one or more of the component computers of SIFT or In their 
I nter con nect Ions. 

The fault tolerance technique selected for this purpose Is that of the recovery 
block IIRAND75J. Specific Implementations of this technique to real-time 
applications and the transport aircraft environment had already been described 
prior to the effort reported here CHECH76, AER078]. The basic structure for a 
recovery block Is 


Ensure T 


By P 

Else by Q 
Else Error 

where T Is an acceptance test condition, I. e. a condition which Is expected to 


J 



be met by succesful execution of either the primary routine P or the alternate 
Q, The Internal control of the recovery block transfers to Q If the test 
condition Is not met by executing P. 

The effectiveness of the fault tolerance provisons depends on the coverage of 
the acceptance test and the avoidance of correlated failure mechanisms In P and 
Q. Prior work had dealt primarily with software associated with a physical 
process (e. g., attitude control), where the environment could be depended on to 
furnish clues on the 'true’ state of the process (e. g., by means of sensors 
Independent of those that furnished the primary Input data). 

The uses served by the fault tolerant modules for SIFT are of an Intrinsically 
logical nature, dealing with the reporting of errors and the action to be taken 
after positive reports. For applications of this type, the environment does not 
furnish Independent clues, and the 'truth' has to be teased out of the logical 
process Itself. Athough the routines to which fault tolerance was applied were 
quite small, the work was therefore quite challenging. The main contribution of 
the effort reported here to the field of fault tolerant software Is the 
evolution of a technique for formulating acceptance tests In logic oriented 
applications based on conditions that are Inherently orthogonal to the logic 
Implemented by the primary routine. A very clear example of this technique Is 
presented In the acceptance test for the error reporter In 2.2. 

Further contributions will be found In the use of fault trees to Identify the 
requirements for acceptance tests and to determine the completeness of the 
coverage of these tests. Some limitations of the recovery block technique were 
encountered In constructing alternate routines that are truly Independent of the 
primary ones (and also of the acceptance test) for applications In which the 
principal operations are addition and subtraction (comparison). In all cases It 
was at least possible to change the order of operations and thereby to avoid 
common sequence dependent failures. Greater Independence might be achievable by 
permitting alternate routines a larger scope (I. e., by letting one alternate 
routine perform the computations carried out In several primary routines). This 
concept seems worthy of exploration In future studies. 


1.2. OVERVIEW OF THE FAULT TOLERANT ERROR REPORTER AND GLOBAL EXECXITIVE 


SIFT achieves Its high reliability by use of multiple processors with an excess 
of computing capacity. When a single processor falls. It Is configured out of 
the system, a measure which ensures survival of the computer as a whole. Thus, 
an Important function of the SIFT operating system Is the retiring of faulty 
processors. A processor Is defined as faulty If Its output differs from those 
of other processors for a given task. The SIFT error reporter and global 
executive tasks collect Information on disagreeing processors, process It, and 
designate processors for retirement to the reconfiguration task. 

The error reporter analyzes error data collected by the voter to determine what 
processors appear to be faulty and Indicates these In an error report. Because 
a processor can not report Itself as faulty (even If the voter data would tend 
to Indict It), error reports from each processor may differ. The global 
executive reviews all error reports, and If two or more processors point to a 
third as being faulty, then the result Is transmitted to the reconfiguration 
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task 


The error reporter and global executive have been made fault tolerant by 
applying the recovery block principle described In section 1.1. Both tasks have 
an acceptance test and recovery block associated with them. Thus, there now 
exists a primary error reporter and global executive as well as a 1 t.9ri13tfi.S« 
Very few changes were necessary to the primary routines In order to Implement 
the recovery blocks, and, with the exception of the addition of a single Integer 
variable, no changes were mad* to the remainder, of the system software. 

As noted above, the error reporter acceptance test establishes that a I I 
processors with an excessive number of disagreements with the voter output are 
detected, and ensures that no properly functioning processors are designated as 
faulty. The alternate error reporter operates Independently of the primary 
routine, but produces an Identical output. The acceptance test Involves 
approximately twenty PASCAL statements, and the length of the alternate error 
reporter Is approximately the same as the primary. Thus, neither routine will 
have a significant effect on the timing of the SIFT operating system. 

The global executive acceptance test Is coded In two modules: the first, which 
Is run before the primary routine, verifies that all Input to the global 
executive Is current, and the second, which Is run after the primary global 
executive, checks for correct execution. If errors are detected by either 
module of the acceptance test, the alternate global executive Is Invoked. 

Execution of each of these routines Is checked by the other. Thus, the global 
executive checks on the execution of the error reporter acceptance test on each 
processor by means of the frame count encoded In the error words. Similarly, an 
output of the global executive which also has a frame count encoded within It Is 
checked by the error reporter In the subsequent frame. Notification to the 
system Is provided In the case of either error. 

In addition to verifying correct execution of their Immediately associated 
primary routines, these acceptance tests can be expanded to give some Indication 
of the functioning of the reconfiguration task. If a processor Indicated as not 
working In the system status vector Is generating error reports, then obviously. 
It has not retired. Although diagnosis of the discrepancy Is beyond the scope 
of the tasks of the software developed here, an Indication Is made to the system 
that an off-normal condition exists, and appropriate action can be taken by the 
operating system. 

A major portion of the coding effort went toward the validation of the five 
Pascal procedures developed as part of the error reporter and global executive 
recovery blocks. Driver routines with approximately 8 to 10 times the amount of 
code In these routines were developed In order to adequately support the large 
number of test cases which had to be run during validation. 


1.3. ORGANIZATION OF THE REPORT 


Section 2 describes the fault tolerant error reporter. Included are a 
description of the acceptance test, the error conditions which It covers, a 
description of the alternate routine. Implementation requirements for 
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Integration of the fault tolerant error reporter Into the operating system, and 
a description of the software validation. Section 3, which describes the fault 
tolerant global executive, has a similar organization. 
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SECTION 2: ERROR REPORTER 


The voter routine of each processor In SIFT maintains Its own record of the 
number of disagreements from the majority of al I other processors. The SIFT 
error reporter marks processors as being faulty based on the disagreement count 
generated by the voter. The error reporter acceptance test compares the number 
of recorded processor disagreements with the output of the error reporter, and 
If processors are Incorrectly characterized as working or failed. It Invokes the 
alternate routine. 


2.1. ERROR REPORTER ACCEPTANCE TEST 


The SIFT voter routine marks Individual processor disagreements from the 
majority In an array designated as errors . The error reporter sets a bit In a 
word called err for each processor with an excessive number of disagreements as 
reported In errors . Bits 0 through 7 In err represent the correspondingly 
numbered processors. The acceptance test checks that te error reporter was 
Invoked In the previous subframe, and calls the alternate error reporter upon 
detection of a discrepancy between err and errors . 

Figure 2.1 Is a flow chart of the proposed error reporter acceptance test, and 
figure 2.2 Is a Pascal listing of the procedure which has been developed and 
tested. The test counts the number of non-dlsagreeing processors In a counter 
designated as right and outvoted processors In a counter designated as wrong . 
It then checks the number of d I sagreanents and the operational status of every 
processor designated as faulty. A Boolean variable to Invoke the alternate 
error reporter Is set to TRUE If a working processor marked as faulty has fewer 
than the threshold number of disagreements. The final segment adds r I ght and 
wrong ; If this sum does not equal the total number of processors, the acceptance 
test will Invoke the alternate error reporter. 

If the error reporter acceptance test does not detect any failures. It writes 
the frame count In the 8 most significant bits of err . When the global 
executive acceptance test checks these bits for the frame count. It will verify 
that the error reporter acceptance test has been executed In the current frame, 
and that consequently, err reports are current. If a discrepancy between the 
current frame and that encoded In the 8 most significant bits of err from a 
particular processor Is encountered, the global executive sets a corresponding 
bit In an Integer variable called m I smatch along with the frame count In the 8 
most significant bits. The error reporter acceptance test will then Increment 
errors In the appropriate position In the subsequent frame. Thus, failure to 
execute the error reporter In the current frame will Increase the likelihood 
that the processor will be retired by the alternate error reporter. 


2.2. COVERAGE OF THE ERROR REPORTER ACCEPTANCE TEST 


The error reporter acceptance test detects the following faults: 
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Figure 2.1. Flow chart of Error Reporter Acceptance Test 
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PROCEDURE ACCEPTANCE_TEST; 

(•error reporter acceptance test*) 

VAR 

EXCOUNT, WRONG, R I GHT, D I V I SOR, CHECK , I , J , M I SM ; I NTEGER ; 
FA I LFLG: BOOLEAN; 


begin 


excount:* mismatch div 256; 

(•check execution count of global exec*) 

If (franecount mod 256 )<>( excount - 1) then erfal ls:*true; 

(•erfalls Is a global variable which notices 
the system that the global exec has not run*) 
mlsm:*mlsmatch dIv 256; 
wrong :»0; 
fal I f lg:=false; 
rlght;=0; 
dIvIsor;=1 ; 

for J:*0 to maxprocessors do (*check for omission errors*) 
begin 

mlsm:*mlsm div divisor 

(•processor has 1 strike against It If 
error reporter didn't run In prev. frame*) 

If odd (misra) then errorsCjH :*errorsCjI] +1; 

If (errorsCji]<threshold) and (workIngCjD) 
then r Ight;*rlghttl ; 

(•count for omissions test*) 

check :=err div divisor; 

(•shift err appropriate 
no. of places to the right*) 

If odd (check) then begin 

wrong : =wrong+1 ; (*count for omissions test*) 

If (errorsCj]]<+hreshold) and (workIngCjD) 
then fal 1 f lg:*true (*check for false positives*) 

end; 

d I V I sor ; =d I V I sor*2; 

end; 

If wrong+rlghtomaxprocessors +1 then fal I f lg:=true; 

(•omissions test*) 

If fal I fig then alt_error_reporter 

else err:=err + 256 * (framecount mod 256); 

end; 


Figure 2.2. Pascal listing of Error Reporter Acceptance Test 
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(1) failure to Invoke the error reporter during each frame 

(2) failure to report processors with an excessive number of 
disagreements as faulty to the global executive, and 

(3) designation of a properly functioning processor as faulty 


The validity of the Input to the test (e.g. framecount. jjfldsina, and fiCEfiCa) Is 
not checked, and It Is possible that errors In these variables could be 
propagated Into err . However, to a certain extent, these failures are covered 
by other processor error reports In the global executive. 

The primary consideration In the design of this acceptance test was that the 
verification and failure detection be performed In a manner Independent of the 
primary error reporter, the following subsections describe the means by which 
the errors 1 1 sted above are detected , 


2,2,1, Failure to Execute During Each Frame 

As noted above, the global executive acceptance test checks the frame count mad. 
256 encoded In the front part of each error report. Consequences of the 
failure to execute the error reporter on a given processor are limited; a 
consistent pattern of failures will be detected by means of the error reports of 
other processors. Discrepancies will ultimately lead to the retiring of 
processors which do not execute the error reporter. The present acceptance test 
Implementation calls for the retirement of the processor If any other 
discrepancy from the system (I,e, voter) output occurs. 

Just as the global executive checks execution of the error reporter, the 
converse also occurs. If the frame count encoded In the front eight bits of 
mismatch minus the frame count mfld. 256 Is not equal to 1, then the global 
executive acceptance test has not been executed In the previous frame, and the 
system Is notifed. Failure to execute the global executive may result In more 
serious consequences than failure to execute the error reporter, and the "one 
count against you" strategy described In the previous paragraph Is not 
appropriate. 


2,2,2, Failure to Report Processors with an Excessive Number of 
Disagreements to the Global Executive, 

In order to achieve Independence from the primary error reporter algorithm, the 
acceptance test checks for this failure Indirectly by testing for the following 
conditions: 

(a) the total number of processors reported as faulty Is correct, and 

(b) all processors designated as faulty have greater than the threshold 
number of d I sagreements 
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In this acceptance test, the number of processors with less than the threshold 
number of disagreements Is counted In a variable designated as right ,, and the 
number having excess disagreements are counted on a second counter labeld wrong . 
If the sum of wrong and right Is equal to the total number of processors, then 
the error reporter can be shown to have performed correctly when the third part 
of the acceptance test, described In the fol I owing sect I on, has not detected any 
failures. This acceptance test Is a particularly clear example of using 
algorithms whIc^ are orthogonal to the primary routine. 


2.2.3. Designated a Properly Functioning Processor as Faulty 

The final part of the acceptance test Is to ensure that all processors 
designated as malfunctioning have at least the threshold number of 
disagreements. This determination Is made by checking the number of 
disagreements of these processors. If any values of the array are below the 
threshold for working processors marked as faulty, then the primary error 
reporter has failed, and the alternate Is Invoked, 


2.3. ALTERNATE ERROR REPORTER 


Independence In the structure and operation from the primary error reporter was 
a chief objective In the alternate routine design. In addition. Its output had 
to be compatible with the global executive. 

These requirements resulted In a routine which Is essentially the Inverse of the 
primary error reporter. An alternate error word, designated as err a , Is 
Initially set ot all 1 *s; the alternate error reporter sets erra bits to 0 If 
the number of disagreements In the appropriate element of the errors array Is 
less than the threshold. If there are more bits In erra than there are 
processors (e.g. If there are six processors and eight bits In erra ) , the 
leading bits are set to 0. Finally, the primary error word, err , Is set equal 
to erra , loaded with frame count Information, and placed In the pre-broadcast 
buffer. The complementary nature of this routine Is maintained In the order of 
setting the error word bits — the processors are checked In ascending order 
rather than the descending order used In the primary error report, Figure 2 , 4 . Is 
a Pascal listing of the alternate error report. 


2.4. I MPL EMENTAT ION REQU I REMENTS 


As noted previously, the acceptance test and the alternate error reporter are 
short and relatively simple procedures which were written to be compatible with 
the SIFT operating system. Additional local variables are required as shown In 
the listings for the error reporter acceptance test and alternate routine. In 
addition, some modifications to the primary error reporter are necessary to 
enable It to transmit processor states to the global executive and execution 
Information to the acceptance test. No changes In the broadcasting protocol are 
required. 
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Figure 2.3. Flow chart of Alternate Error Reporter 


9 






19100 

19200 

19300 

19400 

19500 

19600 

19700 

19800 

19900 

20000 

20300 

20400 

20500 

20600 

20700 

20800 

20900 

21000 

21100 

21300 

21400 

21600 


PRXEDURE ALT_ERR0R_REP0RTER; 

(*thls Is the a Iterate error reporter*) 

CONST 

ALL0NES=377B; 

VAR 

ERRA: I NTEGER; (*alternate error word*) 
l,K; INTEGER; 

begin 

erra:=al lones; 
k: = l ; 

for I ;=0 to maxprocessors do 
begin 

If (errorsCG<threshold) and (worklngCG) 
then erra;=erra-k; 
k;=k*2; 

end; 

erra:=erra - (allones - k + 1); (*remove leading bits*) 
err;=erra + 256*sfcount; 
prebroadcast(errerr,err ) ; 

end; 


Figure 2.4. Pascal listing of alternate error reporter 
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2.5. ERROR REPORTER RECOVERY BLOCK VALIDATION 


The major objective of the testing performed on the error reporter recovery 
block was to provide a comprehensive set of cases which would demonstrate 
sat I sfactory performance when the error reporter was functioning properly and 
when It had failed. Figure 2.5shows the top level fault tree that was used to 
define this set. The recovery block falls If the primary error reporter falls 
without detection by the acceptance test, or If the alternate falls after being 
Invoked by the error reporter acceptance test. Failure due to an undetected 
primary routine fault will occur when both the primary routine falls and the 
acceptance test does not detect It. The same potential failures affect the 
acceptance test and the alternate routine and thus, they were both validated 
simulataneously. 

Figure 2.6 continues the development. There are two major classes of errors: 
failure to Identify a processor with excess disagreements, and reporting a 
processor with less than two disagreements In the error report. Under the first 
class of errors, one, two, or three processors could remain unidentified. 
Further expansion of the tree shows that failure to Identify two outvoted 
processors Is caused by failure to Identify the first process and. failure to 
Identify the second. Similarly, failure to Identify three processors having 
excess disagreements can be broken down Into failure to Identify the first 
processor and failure to Identify the second and failure to Identify the third. 

Figure 2.? continues this development. Any of the six processors could be 
Identified as the first failure. Once the validation has established that the 
error reporter acceptance test and alternate can correctly Identify the first 
error committed by the primary routine (I.e. failure to Identify one processor 
with an excess number of disagreements), validation for the condition of two 
outvoted processors can be performed by holding fixed the first processor with 
excess d 1 sagreeements and only varying the second. Thus, processor 0 Is 
assigned the first error, and processors 1 through five are each. In turn, given 
an excess number of disagreements In the errors array. Similar logic appi les to 
the third and fourth processors with excess disagreements. 

Figure 2.8 Is a further development of the fault tree which summarizes the 
pattern In which the processors are tested. Transfer 1011 shows that all six 
processors are tested for the case In which the primary error reporter falls to 
detect one processor with excess disagreements. Transfer 1012 shows that when 
two disagree excessively, the primary error reporter Is always assumed to have 
dectected an excess disagreement condition In processor 0, and that the 
acceptance test and alternate are tested with the second error In processors 1 
through 5. For failure of the primary error reporter to detect a third 
excessively disagreeing processor, transfer 1013 shows that processors 0 and 1 
are assumed to be the first two, and the third occurs In processor 2, 3, 4, or 
5. Finally, for four errors, processors 0, 1, and 2 are assumed to have excess 
disagreements, and the final error varies between processors 3, 4, and 5. 

The fault tree for the second class of errors, spurious Identification of 
correctly functlonining processors as having excessive disagreements Is shown In 
figures 2.8 and 2.9,. Incorrect Identification of a processor as malfunctioning 
can occur when there are either no disagreements or a single disagreement. 
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Incorrect characterization of the processor can also occur when there are one, 
two, or three other processors which actually have excessively disagreed with 
the voter output. As previously, not all processors need to be considered. The 
testing scheme In this case Is to ensure that the error reporter acceptance test 
can detect a false failure of each processor when any other processor has 
failed. Table 2.1 Is a list of the validation tests required to verify the 
correctness of the error reporter acceptance test and alternate executive based 
on the fault trees described here. 

Complete test required simulation of a major portion of the SIFT operating 
system. The simulation program, called DRIVER, prepares the errors and work I ng 
arrays of the voter and err word of the error reporter based on external Inputs. 
It next Invokes the acceptance test, outputs results, and invokes the alternate 
If an error Is detected. Appendix A shows a complete listing of the program. 
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Figure 2,5. Top level tree for Error Reporter Failures 
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Figure 2.6, Classes of Error Reporter Failures 
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Figure 2.7. SIFT configurations used for detection failure validations 
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• Incorrect Characterization of a Functional Processor as Failed 














Figure 2. 




TABULAR "OR" 


PROCESSOR 0 
HAS EXCESS 
DISAGREEMENTS 

PROCESSOR 1 
HAS EXCESS 
DISAGREEMENTS 

PROCESSOR 2 
HAS EXCESS 
DISAGREEMENTS 

PROCESSOR 3 
HAS EXCESS 
DISAGREEMENTS 

PROCESSOR 4 
HAS EXCESS 
DISAGREEMENTS 


PROCESSOR 5 
HAS EXCESS 
DISAGREEMENTS 


. Final Development of Figure 2.8 
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TABLE 2.1. FAULTS FOR WHICH VALIDATION TESTING IS REQUIRED 
FOR THE ERROR REPORTER ACCEPTANCE TEST AND ALTERNATE ERROR REPORTER 


Fault Tree 
Designation 

1011 

1012 

1013 

1014 

1110A 

1110B 
11 IOC 
1110D 

1111 


Description 


Failure to detect primary error reporter's not 
Identifying a single processor as having excess 
d I sagreements 

Failure to detect primary error reporter's not 
Identifying a second processor as having excess 
disagreements given that the first has been Identi- 
fied 

Failure to detect primary error reporter's not 
Identifying a third processor as having excess 
disagreements given that the first two have been 
Identified 

Failure to detect primary error reporter’s not 
Identifying a fourth processor as having excess 
disagreements given that the first two have been 
Identified. 

Failure to detect primary error reporter's false 
Identification of a functional processor as having 
excess disagreements given that no other processor 
has failed. 

As 1110A, given that 1 other processor failed. 

As 1110A, given that 2 other processors failed. 

As 1110A, given that 3 other processors failed. 


Failure to detect primary error reporter's false 
Identification of a functional processor as having 
excess disagreements given that one other processor 
has fal led. 
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SECTION 3; GLOBAL EXECIIUJ^ 


This section describes the acceptance test and alternate routine for the SIFT 
global executive. The acceptance test Is coded In two modules: the first, 
which Is run before the primary routine, verifies that all Input to the global 
executive Is current, and the second, which Is run after the primary global 
executive, checks for correct execution. If execution errors are detected by 
either module of the acceptance test, the alternate global executive Is Invoked. 


3.1. GLOBAL EXECUTIVE ACCEPTANCE TEST 


The August, 1980 version of the SIFT operating system has the error reports for 
the active processors contained In the array prevoteT errerr , *1 where arreril Is a 
constant set to 1. The error reports themselves are contained within the 8 
least significant bits of each 16-blt element of prevote , and the frame count Is 
encoded In the 8 most significant bits by means of the error reporter acceptance 
test. The global executive reads successive bits of each prevote element by 
shifting the word to the right. Because of this destructive read. It Is 
necessary to reproduce the error report Information prior to execution of the 
primary routine. This task Is performed by the first module of the global 
executive acceptance test designated PREGEXEC. PREGEXEC also checks on the 
frame count which has been encoded by the error reporter acceptance test. After 
execution of the primary global executive, the second module of the global 
executive acceptance test, called GEXECTEST, Is executed. GEXECTEST checks each 
position of each word In an order orthogonal to the primary global executive. 
It then compares this result with the appropriate bit In RECONF, the retirement 
word generated by the primary routine. If there Is a discrepancy, the alternate 
global executive, ALTGEXEC, Is called. ALTGEXEC Is described In section 3, 

Figures 3.1 and 3.2 are flow charts of the two modules of the global executive 
acceptance tests, and figure 3.3 contains the corresponding listings. The first 
module of the acceptance test, PREGEXEC, checks the framecount contained In the 
most significant 8 bits of each error report which have been written by the 
error reporter acceptance test, and then recoples the least significant half of 
the word Into the most significant position In order to preserve them for the 
second module of the global executive acceptance test. 

Those error words containing frame counts different from that of the system are 
set to zero as a means of masking them from the global executive, and a fal lure 
counter for the processor error report Is Incremented. The subsequent execution 
of the error reporter on other processors will count this Indicator as a 
disagreement when writing their reports, an action which will result In 
retirement of this processor If at least one other discrepancy Is detected. 

Once the primary global executive has been run, the second module of the 
acceptance test checks the correctness of Its execution, and Invokes the 
alternate routine upon detection of an error. A major consideration In the 
design of the acceptance test was that It be Independent of the primary routine. 
Thus, whereas the primary checks each position of an error report before moving 
on to the next, the acceptance test checks a given position of al I error reports 
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Figure 3.2 


Flow chart of Global Executive Acceptance Test 






Figure 3.2 (continued). Flow chart of Global Executive Acceptance Test 
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befors movlno UD "to th6 ri6X't position. A socond dIfforoncB botwoon ths prlmsry 
g?oba? executive and the accedence test Is the lack of an Intermediate array 
(I.e. procs ) for the storage of excess disagreements. Thus, once the number of 
discrepancies In a given position has been counted. It Is Immediately compared 
with the corresponding value In the reconfiguration word RECONF. If a 
discrepancy Is detected, a flag Is set that will result In the Invocation of the 
alternate routine. 


If a processor Indicated as retired In the work I ng array Is Indicated as having 
excess disagreements In the Input processor error reports, then one of three 
conditions exists (1) a processor marked for retirement Is still a functioning 
part of the system, (2) an error exists which affects the state of the workin g 
array, or (3) the error report(s) Input to the global executive are not valid. 
Although the global executive can detect this discrepancy It cannot by Itself 
Isolate which of these three conditions caused the anomaly. As a result, the 
global executive acceptance test and alternate logic note the discrepancy to the 
system, but disregard the error reports In preparation of the reconl word. 


3.2. COVERAGE OF THE GLOBAL EXECUTIVE ACCEPTANCE TEST 

The global executive acceptance test described above detects the following 
faults: 

(1) failure to Invoke the error reporter acceptance test 

(2) failure to retire processors reported by at least two other processors 
as having an excess number of disagreements with the voter result, and 

(3) marking for retirement processors which do not have an excess number 
of disagreements 

Detection of the first fault occurs In the first module of the global executive 
acceptance test PREGEXEC. Two probable causes of the discrepancy are; (1) 
Incorrect execution of the error reporter recovery block and (2) no Invocation 
of the error reporter acceptance test. In either case. Information reaching the 
global executive Is suspect, and should be disregarded. If the rest of the 
system Is properly functioning, the only penalty for no retirement at this point 
would be the unnecessary overhead necessitated by the higher number of active 
but not functional processors. Because the discrepancy Is a processor 
disagreement from a majority vote. It should be counted In the total of the 
error reports of the other processors. If any other single disagreement occurs, 
the processor would be retired at the end of the next frame. 

GEXECTEST detects both the second and third faults listed above. The number of 
processor disagreements registered In each processor error report are counted; 
retired or se I f~report I ng processor disagreements are Ignored. If the 
corresponding position In the reconfiguration word Is zero when there are two or 
more reports which have bits set, or the reconfiguration word has a bit set when 
fewer than two (I.e. one or zero) processors are reported In the error words, 
then a boolean variable f a 1 1 f I g Is set. The alternate global executive Is 
Invoked If fal I f Ig Is TRUE. 
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05600 

05700 

05800 

05900 

06000 

06100 

06200 

06300 

06400 

06500 

06600 

06640 

06680 

06690 

06700 

06800 

06900 

07000 

07100 

07150 

07175 

07200 

07225 

07237 

07250 

07300 

07800 

* 


PROCEDURE PREGEXEC; 

(♦This procedure copies the least significant bits of the 
error reporter word bits Into the most significant positions 
after checking the frame number ♦) 

VAR 

excount: INTEGER; 

ERR: INTEGER; 

J,M: INTEGER; 

begin 

mlsmatch:=0; (♦mismatch Is a global Integer 

variable used for marking 
procs. not running ertask^) 

for J:=0 to maxprocessors do begin 

excount;=prevoteCerrerr, jD div 256; 
err:*prevoteCerrerr, Jj mod 256; 

If excount=(framecount mod 256) then 
prevoteCerrerr, jU;*257^err 
(♦copy least sig, bits to most sig. position 
If frame count OK^) 
else mlsmatch:*ralsmatch + 1; 

(♦otherwise send word to error reporters In subsequent 
framed) 

m I smatch : =m I smatch ♦ 2; 

end; 

end; 


Figure 3.3. Listing for Global Executive Acceptance test; PREGEXEC 


24 



11000 

PROCEDURE GEXECTEST? 

11100 

(»Global 

Executive Acceptance test*) 

11200 



11300 

TYPE 


11400 


ZER0_0NE=0. . 1 ; 

11500 

VAR 


11600 


D 1 V 1 SOR , CHECK , 1 , J , SUM ; 1 NTEGER ; 

11700 


FA ILFLG: BOOLEAN; 

11800 


LAST_DIG:ZERO_0NE; 

11900 

begin 


12000 


divisor :=1 ; 

12100 


fal 1 f lg;=fal se; 

12200 


for 1 :=0 to maxprocessors do begin 

12300 


(*...do for each position of report*) 

12340 

(»Thls procedure Is written under the assumption that the primary 

12380 

global executive has rotated the error reports a total of 8 

12420 

positions. If this Is not the case, additional division by 

12460 

(8 - 1 

- maxprocessors) *2 for each error report Is necessary *) 

12500 


sum:*0; 

12600 


for j:=0 to maxprocessors do begin 

12700 


(*...do for each error report*) 

12800 


last_dlg;*(prevoteCerrerr, div divisor) 

12900 


mod 2; 

13000 


If (not workIngCj!]) or (l=J) 

13100 


then last_dlg;=0; 

13130 


1 f (not work IngLlH) and (odd( last_dlg) ) 

13160 


then begin 

13190 


reef a 1 1 :=recfal l+dl visor; 

13191 


(*recfall Is a global Integer 

13192 


showing a retired proc. working*) 

13193 


last_dlg;=0; 

13196 


end; 

13200 


sum;=sura + last_dlg; 

13300 


end; 

13400 


check ;=reconf dIv divisor; 

13500 


if odd(check) 

13550 


then begin 

13600 


If (sum<2)and(worklngCG) then fal 1 f lg:*true 

13700 


end 

13800 


else If sum>=2 then fal 1 f lg;=true; 

13900 


d I V 1 sor ; =d 1 V 1 sor*2; 

14000 


end; 

14100 


If fall fig then aitgexec 

14200 


mlsmatch:=mlsmatch + 256*(framecount mod 256); 

14250 


(* Indicate successful completion of acceptace test 

14275 


to error reporters of next frame *) 

14300 

end; 



Figure 3.3 (continued). Listing of Global Executive Acceptance Test; GEXECTEST. 


25 



3.3. ALTERNATE GLOBAL EXECUTIVE 


The alternate global executive, ALTGEXEC performs a function Identical to the 
primary routine, but In an Independent manner. The flow chart and listing for 
this procedure are shown In figures 3.4 and 3.5. Input to the alternate routine 
Is the same as that used by the acceptance test: I .e. the error reports 
replicated by PREGEXEC. Unlike the primary routine, ALTGEXEC sums the totals of 
the disagreeing processors In descending order, and stores these totals In an 
Integer array. If the totals In this array are less than two, then a zero Is 
placed In the corresponding position of an alternate reconfiguration word, 
reconfa . Otherwise, the position Is set to 1 . A second difference between the 
primary and alternate Is that the error words are not destructively read, and 
can be saved by the system If desired. As a final step of execution, ALTGEXEC 
sets the value of the primary reconfiguration word to that of the alternate. 
The primary reconfiguration word value can also be saved prior to execution of 
this step. 


3.4. IMPLEMENTATION REQUIREMENTS 


Three new procedures: GEXECTEST, PREGEXEC, AND ALTGEXEC are required for the 
operating system. PREGEXEC must be Invoked prior to the execution of the 
primary global executive (GEXECTASK), and GEXECTEST Is executed at Its 
completion. This latter routine will Invoke procedure ALTGEXEC, the alternate 
global executive. If required. Although the routines are presently declared as 
procedures, they may be changed to functions In order to be compatible with the 
form of GEXECTASK. 

An additional global Integer variable, called mismatch. Is required. Frame 
count d I scepr ancles detected In the PREGEXEC routine are recorded In a manner 
similar to processor error reports, I .e. by placing a "1" In the appropriate 
position of the word. The error reporters of other processor will read m I smatch 
and Increment the error counter for the appropriate processor If PREGEXEC 
reports a frame count disagreement. 

A second global Integer variable designated as reef a 1 1 Is used to enable the 
global executive to Indicate the unsuccessful retirement of a failed processor. 
As Is the case with mismatch , the faulty processor Is noted by a "1" In the 
appropriate position. As noted previously, the global executive Is not capable 
of determining whether the processor actually did not respond to the 
reconfiguration order for retirement or whether the "working" array Is Incorrect 
and thus, no further action can be taken by the global executive. 

Changes In the values of each element of the prevote Cerrerr,"] array wil I occur 
due to the Implementation of the fault-tolerant error reporter and global 
executive. As noted previously, PREGEXEC requires the frame count be encoded In 
the first half of the error report from each processor by the error reporter 
recovery block. In addition, the least significant bits of the error reports 
are replicated In the most significant positions by PREGEXEC. It Is not 
anticipated that these changes have any Impact on the rest of the SIFT 
execut I ve . 
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Figure 3.4. (CONTINUED). Flow chart for ALTGEXEC. 
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08000 
08100 
08200 
08250 
08300 
08400 
08500 
08550 
08600 
08700 
08800 
08900 
09000 
091 00 
09150 
09200 
09300 
09400 
09500 
09550 
09700 
09800 
09850 
09900 
10000 
10100 
10200 
10300 
10400 
10500 
10600 
10800 
* 


PROCEDURE ALTGEXEC; 

(»Thls Is the alternate global executive*) 

const maxdlv*32; 

VAR 

RECONF A ,DIVIS0R,MULT,J,K,L,M;1 NTEGER ; 

ERC0UNT:PR0CINT; 

LAST: INTEGER; 

begin 

for J;*0 to maxprocessors do ercountLJJs^O; 

Initial Ize ercount*) 

FOR J:= maxprocessors downto 0 do 
If workIngCjD then begin 

(*..,do for each error report*) 

dIvIsor:=maxdIv; 

for k : =maxprocessors downto 0 do begin 

(*,..do for each position of report*) 
If J=k then last;»0 

else last:=prevoteCerrerr, n div divisor; 

If odd(Iast) then ercountCkj:=ercountCkD+1 ; 
dIvIsor;*dIvIsor dIv 2; 
end 

end; 

(*...now write reconfa*) 

reconfa :=0; 
mult;»1 ; 

for I :=0 to maxprocessors do begin 

If ercountCG>*2 then reconfa:“reconfa+mult; 
mult:=mult*2; 

end; 

pre_broadcast ( gexecreconf , reconfa ) ; 

end; 


Figure 3.5. Listing of Alternate Global Executive 
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3.5. VALIDATION 


The critical nature of the global executive acceptance test and the alternate 
global executive necessitates I comprehensive set /al Idat Ion tests I n order 
to demonstrate that the Incorporation of these routines Into the SIFT executive 
system do not negatively Impact the overall reliability. 

An Pxhaustive set of tests would Involve testing each bit of each error report 
the appropriate response for every possible configuration of 

bits, the configuration of the fnol ^atPs Sf 

reconf word. For six processors, there are a total of 281 

thVse variables, a rather Intimidating number. However, the need for 
comprehenive testing remains. Thus, a major portion of the "9 

devoted to the choice of an appropriate subset of these J 

conclusively demonstrate that the global executive recovery block does not 

contain errors* 

A fault-tree methodology was used to reduce the number of tests to a manageable 
number. tL objective was to develop the trees to a 

the orlmal events. I .e. those at the bottom of the tree, could be tested by a 
reasonable number of cases. If this testing showed that an Insufficient number 
of primal events existed to make the top event (Failure of the global executive) 
1*ru6^ "fhon 1*h6 validation would b© cowiplot©* 

The highest level tree Is shown In flgere 3.6. The toP ^ent *»' '"7, 
global executive recovery block, can be caused by either ” > ® ' “'1® ' " 

primary global executive aM failure of the acceptance test 
failure or (2) the acceptance test Invoking the alternate routine aM failure of 
routine. For the purpos. of this analysis failure o^the primary 
routine Is a given, and thus, failures of the acceptance test and the a I ternate 
must be considered. However, because these routines funct_^n +® 9 ®+her as one 
unit they are tested together In the validation procedure. Moreover, they both 
perform thHame operation, I.e. determining the number of f 

discard a processor, and thus, are subject to the same "'^P®® . ^.®"^ 

subsequent levels of development of these fault trees apply to both routines. 

The next level of development shows the potential failed states 
acceptancrtest and the alternate global executive, which, as noted above, are 

r.nfT?; nTnu^in^ro'c'ess'oV s”! 

number of agreeing error reports). 

Flaure 3 7 Is the tree for the first class of failures: one, two or three faulty 
r^cSso^s JLllnIng unidentified, ^a I Nation test 1010 wl 1 test t^^^ 
for each possible state of the error reports which ^7^ 

processor as having failed. A large reduction In the number Nr 

error reDorter words can be achieved by consideration of the criteria for 
retirement; in order for a processor to be retired, the error reports of two 
other processors must Indicate It had more than 2 disagreements fr^ J^® 
in the^prevlous frame. Thus, If the recovery block can be shown ,^®^®^7n ^ t 
two working processors Indicating a third processor as having failed then It 
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will designate this failure If more than two processors so report. 

As Is shown In figure 3.8, failure to Identify a single faulty processor may 
occur when 0, 1, 2, or 3 processors have been retired by the reconfiguration 
task. Considering all permutations of the SIFT configuration would lead to an 
Impractical number of test cases, and the following logic describes the 
reduction In the validation process: there Is only 1 SIFT configuration when 

a I I processor s are work Ing, and six poss I b I e conf Iguratlons If a single 
processor Is retired. These configurations are tested with all permutations of 
two processors Indicating a third as faulty. Once the validations has establish 
that the global executive can correctly Identify a failed processor with any 
single processor configured out of the system, validations for two processors 
configured out need only consider cases where the first retirement Is held fixed 
(at processor 0) and the second Is varied among the remaining 5. A similar line 
of reasoning can be used to consider three retired processors. Figure 3.9 shows 
the pattern of SIFT configurations that are tested for the single faulty 
processor case. 

The next errors covered In this branch are the failure to Identify two and three 
processors as having failed. In principle an exhaustive test should cover each 
possibility of two or three processors having failed. However, as Implied In 
the fault tree, this can be broken Into the failure to detect the first faulty 
processor, failure to detect the second, and failure to detect the third (If 
applicable). The failure to detect the first processor when no other processors 
have failed has been covered In test 1010A, along with arguments which extend 
the validity of this test to all states of working and r,§£0fll. This same 
argument can be eas I ly extended to cover the case of more than one processor 
hav I ng fal I ed. 

Table 3.1 Illustrates the validation tests required to cover all failure 
possibilities under the tree 1000. The validation procedure calls for processor 
0 to be designated as faulty by processors 1 and 2, and that processors 1 though 
5 be tested in turn In a manner similar to the single processor failed 
validation described above. An analagous line of reasoning can be used for the 
validation of the third processor failed case: processors 0 and 2 designate 

processor 1 as failed, processors 1 and 2 designate 0 as failed, and the third 
processor can be designated from the remaining processors (2 through 5). Table 
3.1 lists this procedure explicitly. 

Figure 3. lo shows the development of the class of errors concerned with 
designation of a functional processor as faulty. The possible failures 
resulting In a spurious processor failure Indication Include counting the error 
report of a processor which Is not working as part of the total disagreement 
count, counting a processor's vote on Itself, or the designation of a functional ^ 
processor on the basis of 1 or no other processor error reports. These failures 
will be tested In tests designated as 1100A, 1100B, 1100C, and 1100D. A 
reduction In the number of tests to be performed occurs by the fact that these 
failures will take place for any value of recent . Also, because the global 
executive acceptance test and alternate operate In the same statement sequence 
regardless of the SIFT state (I.e. there Is no branching to different modules of 
the code depending on the values of working, rficart. or +^'0 failure of a 
particular processor), the same tests apply to al I values of WPrklPfl . 

Table 3.2 shows the list of validation tests and the range of workJj lfl» rfiS . Ofl l. 
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and error reports. Tests 1100A and 1100B can be executed simultaneously with 
test lOOOA. Test 1100C Is executed by placing a single bit In all 36 possible 
error reporter positions, setting the corresponding position In recpnf > and 
determining that both the acceptance test detects the error and the alternate 
routine functions correctly. Test HOOD Is performed by setting each bit of 
recon f to 1 with no bits set in the error reporter words. 
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Figure 3.6. Top Level Fault Tree for Global Executive Validation 
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Figure 3.7. Classes of. Global Executive Faults 
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Figure 3.9. Expansion of Figure 3.7 





37 


Figure 3.9. Expansion of Figure 3.7 (continued) 
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Figure 3.10. Spurious Identification of a Functional Processor 








Table 3.1. Validation Tests for Global Executive Faulty Processor Dectectlon 

Failure 


TEST 

ERROR DESCR. 

work I ng 

1000A 

1 FAILED PROC. 

0,1, 2, 3 not 


UNDETECTED BY 
PRIMARY 

work I ng 

1000B 

2 FAILED PROC. 
UNDETECTED BY 
PRIMARY 

0 not working 

lOOOC 

3 FAILED PROC. 
UNDETECTED BY 
PRIMARY 

0 not working 


prevote 

reconf 

NOTE 

1 reported 
(1st Indicated 
by any 2 other 
error reports) 

0 retiring 

1 

2 reported 

1 retiring 

2 

(2nd Indicated 

(1st In any 


by any 2 other 

position of 


error reports) 

reconf) 


3 reported 

2 retiring 

3 

(3rd Indicated 

(2nd In any 


by any 2 other 

position of 


error reports) 

reconf) 



NOTES I 

1. Failure of the primary global executive for this condition Is manifested by 
both the following conditions: (1) one processor Is Identified as having excess 
disagreements by the Individual error reports, and (2) the primary global 
executive did not mark this processor for retirement In the reconf word. This 
validation test Is performed with 0,1, 2, 3 processors not working In order to 
determine whether the acceptance test and alternate are capable of detecting a 
single (or the first In the case of multiple) processor failure given any SIFT 
state. If any more than three processors are not working, the entire computer 
falls. 

2. Failure of the primary global executive for this condition Is manifested by 
the following conditions: (1) two processors are Identified as having excess 
disagreements by the error reports, (2) the primary global executive marked the 
first processor for retirement In r^onf. and (3) the primary global executive 
did not mark the second processor for retirement. Validation testing for 
detection of the first processor given any configuration of worKlllfl with 0, 1, 
or 2 processors out and no processors marked for retirement has already been 
performed In ICOOA.Thus, this validation need only establish that the acceptance 
test can detect a second processor as having failed when the primary has 
marked only a single processor for retirement In recgflf » 

3. Failure of the primary global executive for this condition Is manfested by 
the following conditions: (1) three processors are Identified as having excess 
disagreements by the error reports, (2) the primary global executive marked the 
first two processors for retirement In reconf . and (3) the primary global 
executive did not mark the third processor for retirement. Validation testing 
for detection of the first processor given any configuration of worKiJlfl with 0, 
1 , or 2 processors out and no processors marked for retirement has already been 
performed InlOOOA. Validation of the ability of the acceptance test to detect 
the second processor failure has been performed InlOOOB.Thus, this validation 
need only establish that the acceptance test can detect a third processor as 
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having failed when the primary has marked only two processors for 

retirement In reconf . 



Table 3.2. Validation Tests for Incorrect Retirement Errors of 

Global Executive 


TEST 

ERROR DESCR. 

work 1 ng 

prevote 

reconf 

1100A 

PROC. RETIRED 

0,1 ,2,3 not 

1 other proc. 

1 proc. marked 


ON BASIS OF 
SELF DIAGNOSIS 

work I ng 

reporting 
(any position) 

for retirement 
(any position) 

1100B 

PROC. RETIRED 
ON BASIS OF 
NOT WORKING 
PROC. REPORT 

1 not working 

1 other proc. 
reporting 
(any position) 

1 proc. marked 
for retirement 
(any position) 

1100C 

PRX. RETIRED 
ON BASIS OF 
ONLY 1 ERROR 
REPORT 

0 not working 

1 other proc, 
reporting 
(any position) 

1 proc. marked 
for retirement 
(any position) 

11000 

PROC. RETIRED 
ON BASIS OF 
NO ERROR 
REPORTS 

0 not working 

no other proc. 
reporting 

1 proc. marked 
for retirement 
(any position) 


APPENDIX A* ERROR REPORTER DRIVER ROUTINES 


Although both the error reporter acceptance test and the alternate routine are 
relatively brief procedures, a complete test required the simulation of a major 
portion of the SIFT operating system. The simulation program, called DRIVER, 
prepares the errors and work I ng arrays of the voter and the err word output of 
the error reporter based on externally Input data. It next Invokes the 
acceptance test, outputs Its results to file TTY (for diagnostic purposes), and 
Invokes the procedure If an error Is detected, A complete listing of the 
program follows this description. 

Figure A, I Is a helrachical representation of the program organization. The 
main program first Invokes procedure lOFILES which either opens a previously 
written test data Input file, prepares to write a new file, or simply accepts 
Input and outputs directly to file TTV, Each of the subsequent procedures 
contain branches for the data source and destination defined In this routine. 
The main program the Invokes procedure LIWEP, which determines the number of 
Iterations (I,e, frames), FRAME (XIUNTER, the next procedure Invoked, sets the 
value of framecount against which excount , the Internal counter of the error 
reporter. Is compared. The program then Invokes the VOTER and ERROR REPORTER 
procedures which, on the basis of Input data, prepare the workJiig and 
arrays and the err and excount variables. The ACCEPTANCE TEST procedure Is then 
run, and the alternate error reporter Is cal led by It In the event of the 
discrepancies discussed above. Subsequent Iterations repeat the process from 
FRAkE COUNTER through ACCEPTANCE TEST until the repetition limit Is reached. 
Upon exiting the loop, the main program Invokes procedure which closes any of 
the files opened In lOFILES and ends the simulation. 

It should bo noted that the actual error reporter acceptance test and alternate 
error reporter which were tested are shown In this listing, and that they are 
not Identical to those shown In figures 2,2 and 2,3, These latter listing were 
changed to be compatible with the SIFT operating system (by Including a 
pr eb roadcast ( err err ^ err ) statement) and eliminating display related statements 
(e.g. outputs to TTY and the BINPARS routine which represented the error words 
as binary numbers). An additional alteration was made to the acceptance test 
routine to Include testing of the mismatch variable. None of these changes are 
sufficiently significant to warrant additional validation testing. 

Appendix C contains a sample output from this driver routine. 


42 



DRIVER. PAS 



o 


Figure A.1 Organization of Program DRIVER 











0 0^00 
0 02 0 0 
0 0300 
0 0400 
0 0500 
0 0600 
0 Of 0 0 
0 06 0 0 
0 09 0 0 
0 1000 
0 1 10 0 
0 1 33 0 
0 1 30 0 
0 1 400 
0 15 0 0 
0 16 00 
0 17 00 
0 18 00 
0 19 0 0 
0 2000 
0 2 100 
0 2200 
0 230 0 
0 2400 
0 2500 
0 2600 
0 27 00 
0 26 00 
0 29 00 
0 30 00 
0 310 0 
0 3 20 0 
0 33 ) 0 
0 3400 
0 3500 
0 3600 
0 37 00 
0 37 50 
0 S 0 0 
0 5 0 0 
0 4000 
0 410 0 
0 420 0 
0 430 0 
0 4400 
0 45 0 0 
0 46 0 0 
0 4700 
0 48 0 0 
0 49 0 0 
050 00 
0 51 0 0 
0 520 0 
0 5 300 
0 5 4 0 0 
0 5 50 0 
0 560 0 
0 570 0 
0 530 0 
0 5 9 0 0 
0 6 0 0 0 


PR OCR AM DRIVE R; 

C ONST 

(»the following declarations are tak^n from 
the AUGUST, 1 9 3 0 VE R SI ON 0 F T HE S I F V 
0 PE RAT I tn SYSTEM * ) 

MAX PR 0C£SS0RS=5 ; 

M AXf rame= 50 ; 

T H RESH a D=2 ; 

TYPE 


PROCES30R = 0. • MAX PR0CES3)RS; 

PR OCI NT = A RR AY[ PR OC ESSOR] OF INTEGER; 

PR X EO OLrA RRAY [ PROC ESSOR] OF BOOLEAN; 

VA R 

ERR:I NTEGER; 

E RRORS: PRXI NT; 

R EPORTrPROCI NT; 

WO RK ING: PR X BO OL; 
f ramecount: INTQOER; 

( » 

the following declarations are necessary for 
the error reporter recovery block •) 

E RFAI LS rinte^er; 

( * 

the following varables are necessry only 
for the driver prcedures*) 

1, J , K ; INTEGE R; 

RPTLIM :I NTEGER; 

FILEmME rPACKED ARRAY[1. .8] OF CHAR; 

TITLE: PACK ED A RRAY[ 1 . .40] OF CHAR; 

FIL: INTEGER; 

I NTREP :PR OCI NT; 

PR X ED UR E I OFILE S; 

(•this program sets up files for both input and outptt 
a s d e t eri ned byFIL input from thek eyb oard* ) 
b egln 

wr i teln( t ty ,’Te St of Error Reporter Recovery Block' ); 
wr i teln ( t ty I /O options: tty alone(O), input file(1) 

wr i t eln ( t ty ,* Cr e a t e File(2) *); 
r ead( t t y ,f il ) ; 
if fil>0 then begin 

writeln(tty, 'enter filename' ); 

r ead ln( t ty) ; 

readln(tty,file nam e) ; 

if fil=1 then res e t (i np ut, f il e nam e) 

else rewrite(output, file nam e) ; 

wr i teln(t ty ,f ile nam e. ’ ready' ); 

e nd 

e Ise wr iteln(t ty ,'I/0 through terminal only'); 


end; 


PR OCED UR E L IM REP ; 

(•SET REPETITIONLIMITFORMAN PROCEDURE*) 
b egi n 

if f il< >1 then beff in ( *promD ts f o r T Y i npuc» ) 

wr i te ln( t ty) ; 

writeln(tty /enter number of repetitions* ); 
r ead( t ty ^ p tl ira ) ; 

if fil = 2 then wr i t d. n ( ou t put , r ptl im ) ; 

e nd 

else begin 

r e ad( i n p u t, r p tl im ) ; 
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0 61 00 
0 6200 
0 63 0 0 
0 64 0 0 
0 650 0 
0 6 60 0 
0 670 0 
0 660 0 
0 690 0 
07 0 00 
0 71 00 
0 7200 
0 73 0 0 
0 7400 
2 ) ; 

07500 
0 7 60 0 
0710 0 
0780 0 
0 7 90 0 
0 80 00 
0 81 00 
0 82 00 
0 8300 
0 8400 
085 00 
08 60 0 
0 8700 
0 8800 
08 85 0 
08 90 0 
0 90 0 0 
0 91 00 
0 92 00 
0 93 0 0 
0 94 0 0 
095 00 
0960 0 
0 9700 
0 9 dO 0 

0 9900 
10 000 
1010 0 
10 20 0 
10 30 0 
10 40 0 

1 0500 
1 0600 
10 70 0 
1080 0 
1 09 00 
1 10 0 0 
1110 0 
1 12 0 0 
1 1 300 
1 14 0 0 
1 15 00 
1 16 0 0 
1 17 0 0 
1 18 00 
1 -1 9 0 0 
1 20 00 


e nd; 


wr i teln( t ty ,r p tl im , * repetitions*) 

e nd; 

r ptl ijn : =r ptl Im- 1 ; 


PROCED UR E VOTER; 

(*this procedure is to manaullv input the error[p[i)] 
array generated in the voter routine*) 


P egi n 

iffilOl then Petr in (»tty ipit*) 

wr i teln(t ty ,*procedur e voter --enter errors* ); 
for i: = 0 to raaxp r ooes so rs do Pegin 

wr i teln( t ty ,*nun Per of errors for pro«:es30 

r ead( t ty ,e r r ors[ i] ); 
write ln(tty,*working? (1/0)?*); 
r ead( t ty ,i ntr eo [i ] ); 
wr i teln(t ty) ; 

if fil = 2 then wr ite(out put. e rrors[i] ,intrep[i] ); 

e nd 

e nd 
else 

for i: = 0 to ma xp r ocje s so rs do (*fiie input*) 
read(input,er rors[ i] , i n trep[ i] ); 

for i: = 0 to maxp rooes so rs do ( *a 1 1* ) 

if intrep[i]<1 t hen wo rk i ng[ i] : = f als e 
else workirg[i]: = true; 

end; 

PR OCED UR E BI NPA RS(VA R NIM iINTffiER); 

( *p rocedu re to represent an integer as a 16 bit string • ) 

V ar 

binr: array[0. .15] of integer; ■ 

t num :i nteg er; 
d ivi 3, i , J :i nteger; 

byte: packedarray[1..2 0] of char, 

begin 

d ivi s: = 327 68; 

if nun>6 5535 then begin 

writelnCtty / overflew * ); 

nun:=nun mod 6 5535; 

end; 

t mm : s n un ; 
j : = 0 ; 

for i: = 1 5 dewnto 0 do begin 

if trwm div divis>=l then begin 
tnum:=tnum mod divis; 

b inK i ] •- 1 

e nd 

else bi nr[i ] : = 0 ; 
d ivi s: =d ivi 3 div 2 ; 

j: = j-^1 ; 

if binr[i] =1 then byte[J]:= *1’ 
e Is e by t e[ j ] : = *0 * ; 
if (i mod 4=0) then be^cin 

b v t e[ j 3 : s ’ ' ; 

e nd ; 

e nd; 

wr i teln( t ty ,b y te) ; 
wr i teln( t ty) ; 


44 


12100 
1 220 0 
1 230 0 
1 2400 
1 2500 
1 26 00 
1 27 00 
1 28 0 0 
1 29 0 0 
1 30 00 
13100 
1 320 0 
1 3 30 0 
1 3400 
1 35 00 
1 35 50 
1 36 0 0 
137 0 0 
1 38 00 
1 39 0 0 
14000 
1410 0 
1 4200 
14 30 0 

14 40 0 
14500 
1 4600 
148 0 0 
1 4900 

15 000 
15100 
1 5200 
15 30 0 
1 5400 
1 5500 
1 5600 
15 70 0 
1 5800 

15 90 0 
1 6 0 0 0 

16 10 0 
16200 
16 300 

16 400 
16 500 
1 6600 
16 70 0 
1 6800 
16 90 0 
1 7000 

17 1 00 
17200 
17 300 
17 400 
17 500 
17 60 0 

17 70 0 
1 7800 
1 7900 

18 000 


e nd ; 

procedure er ro r_r epo rter ; 

(*this procedure is to manually input the reoort[p[i]] 
array assumed to be generayted by the ar*ror reporter*) 


VA R EXCOUNT:! NT EGER; 
b egi n 

if filOl then begin 
writ eln ( t ty) ; 

writeln(tty ,'framecount 
read(t ty ^xcount); 


( *t ty i np ut* ) 

(•initialize the frame count*) 

is' , frame count: 2, ' enter execution'); 

(•error reporter would be incrementing 
its own fraecounter he re •) 


writ el n(tty, 'title' ); 
r eadln( t ty) ; 
readln(tty,titl e); 
wr i teln (t ty ) ; 

if fH = 2 then wr i te( ou t put, e xc oun t, t i ti. e) ; 

wr 1 teln (t ty ,' p ro cedi r e erro^ repoter -- enter report' ); 


err := 0 ; 

for i:= maxp r oces sc rs dcwnto 0 do begin 

wr iteln(tty ,'proc' , i:2, ' errrpt.(1/0) ='); 
r ead( t ty ,r epo rt [i] ); 
e rr: =e rr* 2 ; 

if (not workirgCi]) or (r eport[i] >0) 
then err: = e rrM- 1 ; 

if fil=2 then wri te(out put , r epo rt [i ] ) 


end; 

wr i teln ( t ty) ; 

err: = err+ 256»excount; (*eombine error and executionct ») 
if f il = 2 then wr i te( o ut put, e r r) ; 


e nd 

else beg i n 

(•file Ino ut • ) 
r ead( input, excount, title); 
for i:=maxDroGes3ors dcwntoO do 
r ead( i np ut.r epo rt [i ] ); 
r ead( i np ut, e r r) 

e nd; 

wr i teln ( t ty) ; 
writeln(tty ,titl e); 

writeln(tty,'frame no.' ,franecaunt:3,'execution' ,excount:3); 
wr i t eln ( tty); 

wr iteln(t ty .'processor' : 1 5, 'voter er ror' : 2 0 , 'e rr or report':20, 
' wo rk i ng' : 2 0) ; 

for i: = 0 to maxprooessors do begin 
wri teln ( t ty) ; 

wr i teln( t ty ,i : 1 0, er rors[i] :20,report[i] :20, 
i ntrepCi] :2 0) ; 


e nd; 

wr i teln(t ty) ; 

wr iteln(t ty , 'prim ary error wo r d= ' , e r r: 5 ) ; 
blnpars(err) ; 


e nd; 

PR OC ED I® E F R AM E_C OU NT ER; 

(•This procedure is to slmulatethe executloncounteron 
error reporter acceptance test by means of manual input 


t he 

• ) 


b egi n 
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18100 
18 20 0 
18 300 
18 400 
18 500 
1 3600 
1 8700 

18 30 0 
18 90 0 

19 000 
19 1 00 
19 2 00 
1 9 3 00 
19 400 
1 95 00 
19 60 0 

1 9700 
19600 
19 90 0 

2 0 000 
20100 
2020 0 
2030 0 
2 0 40 0 
2050 0 
2 0600 
2 0700 
2 0800 
209 0 0 
2 10 00 
2 1 10 0 
2 1200 
2 13 0 0 
2 14 0 0 
2 15 00 
2 1600 
2 17 0 0 
2 18 0 0 
2 19 0 0 
2 2000 
2 2 100 
2 220 0 
2 230 0 
2 24 0 0 
2 25 00 
2 26 0 0 
2 2/00 
2 28 0 0 
2 29 0 0 
2 30 00 
2 310 0 
2 320 0 
2 3 30 0 
2 34 00 
2 3500 
2 36 00 
2 37 0 0 
2 33 00 
2 39 0 0 
2 4 0 0 0 


end; 


franecount: = frani«count«f1; 


PR X ED UR E CLOSff'ILES; 

(^Close the input or ou tp it files if necessry^) 
b egi n 

if fil=1 then c los e( i npu t) ; 
if fils2 then close (output); 

end; 


PR OCED UR E AL T_E RR0R_r eportE R; 

(*this is the alterate error reporter*) 


CONST 

ALL0NES=3 77B; 

VA R 

E R R A:I NT EG E R; (^^ternate error word^) 
I,K ; I NTEE R; 


begin 


end; 


wr i teln(t ty) ; 

wr 1 teln(t ty alt erate error reporter invoked* ); 
erra: =a Hones; 
k : = 1 ; 

for i: = 0 to maxprooessors do 
b eg i n 

i f (e r rors[ i] <t hr eshold) a nd (work ing[ i] ) 
then er ra :=e rra-k; 
k:=k*2 ; 

e nd ; 

erra; = erra - (allones - k + 1); (*remove leading bits*) 
wr i teln( t ty a It ernate er ror words • , e r ra: 5 ) ; 
erriserra ^ 2 5 0* f r ame ecu « t ; 
b inpars (e rr) ; 
wr i teln(t ty) 


PR X ED UR E A CCEPT ANCEJT EST; 

(♦error reporter ac cet an ce test*) 

VA R 

EXC OUNT, WR ON} , RICH T. D IV I SOR , CH BCK , I • J : IN TEGER; 
F AI LFL G: EO OLE AN ; 


b egi n 

excount:= err div 2 56 ; 
e rr: se rr mo d 2 56; 
if ex ecu n tsf ram e c ou n t then begin 
wr o ng: s 0 ; 
f ailfl g: = f al 3 e; 
r igh t: s 0 ; 
d i vl so r: = 1 ; 

for j: = 0 to maxD rooessors do (*check for omissionerrors*; 
b egi n 

i f ( e r r ors[ j ] <t hr e shold) a nd ( w ork i ngf J ] ) 
then righ t; sr igh t*i- 1 ; 

( *0 ou n t f 0 r omissio ns test'/ 

checkiserr div divisor; 

(♦shift err appropriate 

no. of places to the right* 
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p24100 

; 30000 


2 4 100 

if odd(check) then bea,in 


2 4200 

wr o ng: =wro ng+ 1 ; (^ount fb r omissions 

t A S t* ) 

2 4300 

if (erpors[J]<threshold) and (worki ngf j ] ) 

2 4400 
) 

2 4500 

then f ail f 1 g: = t rue (*chedc fb r false 

po A Itiv es* 

e nd ; 


2 4600 

d ivi so r; =d ivi so r* 2 ; 


2 47 0 0 

e nd ; 


248 0 0 

if wr ong +r igh t< >m axp r oc esso ns -»■ 1 then f ailf 1 g: st rue; 

\ 

2 49 0 0 

( ^^o mi s s io ns test*) 


2 5000 

if failflg then al t_e rror_r epo rt er 


25100 

else wr iteln( t ty error reporter OK* ); 


25200 

e nd 


2 5300 

e Is e ben i n 


2 5400 

wr i t eln ( t ty ) ; 


2 5 50 0 

wr i teln (t ty , * p r in ary error reporter did not run* ); 


25 60 0 

alt e rror^r eporter; 


25 70 0 

wr i teln ( t ty) ; 


2580 0 

e nd; 


2590 0 
26 0 00 

end; 


261 00 
2 6200 

( *MAIN PROCEDUR E* ) 


26300 

B EG IN 


2 6400 

I OFI LE S; 


2 6 50 0 

L IM REP ; 


26 60 0 

R EPE AT 


2 6 700 

f rsme.C OUNTEI^ 


26 80 0 

VO T ER ; 


26 90 0 

ERROR.R EPORTER; 


27000 

A CCEP T ANC E_T EST; 


271 0 0 

UNTIL f r ame cou n t>R PT LIM; 


27200 

CL OSEFI LE S; 


27300 

• 

E ND . 
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APPENDIX B. GLOBAL EXECUTIVE DRIVER ROUTINES 


A significantly larger set of test cases was necessary for the global executive 
validation, and thus. Its driver routine, GEXEC, used file Input exclusively for 
the validation test Input data. Two routines were used to Input test data: 
INGEX, which accepted data directly from a terminal for generation of a small 
number of test cases, and MVTEST, which had an Internal procedure for generation 
of a larger number of cases. 

Program INGEX consists of 5 procedures; BINPARS, which represents Integers as 
16-bIt binary numbers, CONV, which converts the Input error reports and retiring 
processors Into Integers ( err and recon f ) used by the global executive, PRELIM, 
which opens a file for the test cases, OUTFILE, which writes the data to the 
file, and INDATA, which Issues prompts to file TTY and processes the resultant 
input. The program first opens a file with procedure PRELIM, and then accepts 
Input and writes to the file until the user specified number of test cases has 
been reached, and then saves the file for use by GEXEC. 

MVTEST Is composed of 4 procedures: ZERO, which zeros out the error reporter 
representation array for a new case, MVINIT which Initializes an array 
containing all possible test cases for a given number of faulty processors, 
DISP, which performs additional processing and writes the cases to an output 
file, and MATCH, which selects a single test case from the possibilities 
generated by MV. Modifications to the main procedure, MATCH, and DISP were made 
for the generation of test cases for various configurations of the system (I.e. 
values of work 1 ng ) and number of processors becoming faulty In the current frame 
as described In section 3.5. 

Progran GEXEC contains 7 procedures: BINPARS, which was described above, 
PREGEXEC, the first module of the global executive acceptance test, ALTGEXEC, 
the alternate global executive, GEXECTEST, the second (and main) module of the 
acceptance test, INFILE, which reads files created by either INGEX or MVTEST, 
and PRELIM, which opens the files used by GEXEC. After PRELIM opens a file, the 
program flow Is from INDATA, which prepares the Input for the acceptance tests 
and alternate routine (If necessary), to PREGEXEC, GEXECTEST, and ALTGEXEC (If 
Invoked by GEXECTEST). This sequence Is repeated until the end of file 
condition Is reached. 

A modification of GEXEC, designated VALGEX, was used for creating a more terse 
output. This was necessitated by the large number of test cases (almost two 
thousand) . 

As was the case with the error reporter, modifications of the PREGEXEC, 
GEXECTEST, and ALTGEXEC procedures were made to remove all TTY I/O, make the 
output of the routines compatible with the SIFT operating system, and to Include 
references to the m I smatch variable described In sections 2 and 3. These minor 
alterations are not expected to affect the correctness of the routines as 
established by this validation. 

Listings of INGEX, MVTEST, and GEXEC follow this description, and the output of 
GEXEC Is described In Appendix C. 



PROGRAM INGEX 


50 



00100 
0 020 0 
0030 0 
0 0400 
0 050 0 
0 0600 
0 C70 0 
0 060 0 
0 0900 
0 10 00 
0 1 10 0 
0 1 ao 0 
0 1 300 
0 1400 
0 15 00 
0 16 00 
017 00 
0 18 0 0 
0 19 0 0 
0 2000 
0 210 0 
0 22D 0 
0 230 0 
0 2400 
0 2500 
0 26 00 
0 2700 
0 28 00 
0 29 00 
0 30 00 
0 310 0 
0 3 20 0 
0 3300 
0 3400 
0 35 00 
0 36 0 0 
0 3700 
0 33 0 0 
0 39 0 0 
0 40 00 
0 4 10 0 
0 420 0 
0 4300 
0 440 0 
0 45 0 0 
0 4600 
0 4T00 
0^00 
0 49 0 0 
0 50 00 
0 51 0 0 
0 52 00 
0 53 0 0 
0 5400 
0 550 0 
0 560 0 
0 5700 
0 580 0 


PR OGR AM I NGE X ; 


CONST 

m axp rocessn rss5; 

TYPE 

procesaorsO. .m axp rocesao rs ; 
p rocint =a rray [p rocessor] of integer; 

VA R 

FI LEW ME : P A CK ED A R RA Y [ 1 . .8 ] OF CHA R; 
CASENAl^: P A OC E D A R RA Y [ 1 • .4 Ol OF CHAR; 

C ASENO,MAXC ASE, FRilM E CO UNT: I NT EG ER; 

NIM REC, NUMO UT. N IM PR OC ,R EP R OC , F AU LTPROC , 
NtM FAJ LT, PROCRET,PROCObT :PR0CES3DR; 

TVE C ,I NTREP , RETIR ING: PR OCI NT; 

ERRORS: AR RAYCPROCESSOR] OF PROCINT; 


PR OCED UR E BI NPA RS(VA R N UM :I NT ED E R) ; 

(•procedure to represent an integer as a 16 bit 
V ar 

b inr: array[0,.15] of integer; 
t num : i n t eg e r; 
d ivi 3, i . j :i nteger; 

bvte: packed arrayCl. .2 0] of char; 

b egi n 

d ivi s: = 327 68; 
if nun>655 35 then bee in 
wr i teln(t ty / ove rf 1 cw • ); 

nun:=nun nod 6 5535; 

e nd; 

t miD : = n un ; 

J: = 0; 

for i: = 15 down to 0 do bear in 

if tnum div divis>=1 then begin 
t num : = t num mod di v i s ; 
b inr[ i] : = 1 

end 

else binr[i ] : = 0 ; 
d ivi s: =d ivi s div 2 ; 


ifbinrCi] =1 thenbyte[J]:=’1' 
e Is e bv t e[ j ] : = *0 ’ ; 
if (i mod 4=0) then begin 
= ; 

by t e[ j] :s ’ » ; 

e nd; 

e nd; 

wr i t eln ( t ty) ; 
writeln(tty ^yte) ; 
wr i teln(t ty) ; 

end; 

FUNCTION CONV (A RAY :PR OCINT) : INTEGER; 

VA R 1 , j , k: i n t eg e r; 
begin 

J;=1 ; 
k: = 0; 

for 1:=0 to maxp r oces so rs do begin 
k : zk +a ray [ 1] • j ; 
j: = j»2 ; 


string • ) 
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0 5900 
0 60 00 
0 61 0 0 
0 62 00 
063 00 
0 64 00 
0 6500 
0 660 0 
0 670 0 
0 680 0 
0 690 0 
0 70 00 
0 71 0 0 
0 72 0 0 
0 73 0 0 
07 4 00 
0 7500 
0 7600 
0 77D 0 
0 7800 
0 790 0 
0 80 00 
0 81 0 0 
0 82 0 0 
0 83 00 
0 84 0 0 
0 8500 
0 8600 
0870 0 
0880 0 
0890 0 
0 90 0 0 
0 91 0 0 
092 00 
0 93 0 0 
0 94 0 0 
095 00 
0960 0 

0 9 70 0 
09 80 0 
0990 0 

1 0000 
10 10 0 
1 0200 
10 30 0 
10 40 0 
10 50 0 
10 60 0 
1070 0 
1 0800 
1090 0 
1 10 0 0 
1 110 0 
1 12 0 0 
1 1 300 
1 14 0 0 
1 15 0 0 
1 1600 
1 17 0 0 
1 18 00 


e nd; 

c onv; s k ; 

end; 

PROCED UR E PREL IM; 
b egi n 

wr i t eln (t ty /En t er file name’ ); 
r eadln(t ty) ; 
r ea d( t ty ,f 11 e nam e) ; 

wr i teln(t ty ,’En ter total number of crocEssors’ ); 
read(t ty unproc) ; 

wr i telnH ty /En ter nianber of cases’ ); 
r ead (t ty ^ ax case ) ; 

e nd; 


PROCED UR E OUTFI LE; 

(•write outpit to file and report to tty*) 

VAR prevote,J,k,recorf,nunfault:int€«er; 
begin 

wr iteln(output. case name) ; 

wr i teln (tty,’case ’jcasenane); 

wr i teln( t ty) ; 

forkirO to maxprooessors dowrite(output,intr€p[k]); 
wr i teln(t ty ,’work ing status’ ); 

for k;=0 to maxprooessors do write(tty,intrep[k]); 
wr i teln(t ty) ; 

for k:=0 to maxprooessors do be gin 

for .1: = 0 to maxprooessors do tv ec[ J] : =e rrors[ k , j ] ; 

p r ev 0 1 e: =c onv( t ve c) + 2 5 6*f ram ec n t ; 

wr i t eln ( t ty e r r or report for processor ’ ,k:2); 

b inpars (p r ev ot e) ; 

wr i te( out put. pr er ote) ; 


e nd ; 

r ec onf : =c onv(r et irinr ) + 2 5 6*f r am ec cu n t; 
writeln(tty /Reconfiguration word’ ); 
b inpars ( r ec onf ) ; 
writelnCout put , r e c cnf ) ; 


end; 


PR OCED UR E I NDA TA ; 

( *This procedure d oes the ac t ual t es t c ase inp ut* ) 

VA R i ,m ,n , j : i nt eger; 
b egi n 

wr i teln ( t ty e n ter case nam^’ ); 
r eadlnC t ty) ; 
r ead(t ty ase nam e) ; 
wr iteln(t ty/Enter frameccunt’); 
read(t ty jfraneccunt); 
form: = 0 to maxprooessors do begin 
i nt rep [m] : = 0 ; 
r etiring [ m] := 0; 

for n:=0 to maxprooessors do e r rors [m , n] : = 0 ; 

e nd ; 

(•.•.Prepare the intrep army*) 
writ eln (tty/How many processors arenot working?’); 
read(tty ^imout); 
if nunout>0 then begin 

wr i teln( t ty ,’w hioh promssors not working?’); 
for i:s1 to numout do begin 
r ead( t ty ,procou t) ; 

52 



1 19 0 0 
1 20 0 0 
1 210 0 
1 220 0 
1 2300 
1 24 0 0 
1 25 0 0 
1 26 0 0 
1 27 0 0 
1 2800 
1 29 0 0 
130 00 
1 310 0 
1 320 0 

13 33 0 
1340 0 
1 3500 
136 0 0 
1 37 0 0 
1 3800 
13900 
14000 
1410 0 
1420 0 

14 30 0 

14 40 0 
1 4500 
14600 
14700 
1 4800 
149 00 

15 000 
15 10 0 
15200 
15 300 
15 400 
15 50 0 
1 5600 
15 70 0 


i nt rep[ proccut]:=1; 

e nd ; 


end; 


( * . . ,Pr ep are the 
'How many processors 


wr i teln (t ty 

J: = 0; 

readln(t ty j^iunfault) ; 
if numfault>0 then repeat 


er rors ar ra v* ) 
are f aul ty? ' ); 


writeln(tty, ’wrong processor', j:3); 

wr i teln(t ty which processor is faulty?'); 

r ead( t ty ,f aul t p? o c) ; 

wr i teln(t ty ,'h ow many processors report it as faulty?' 
r ead( t ty ,nump roc) ; 

wr i teln(t ty , 'which processors reported it?'); 
for i: = 1 to numproc do begin 
r ead( t ty ,r epr oc) ; 
errorsCreproc, f aul tpr oc] : = 1 ; 


); 


e nd 

u ntil j =n un f aul t; 
wr 1 teln (t ty) ; 

wr i teln (t ty , 'Sum ma ry of Error Reports of all processors' ); 
wr i t eln (t ty , 'Re po r ting Faulty'); 

wr i t eln ( t ty , ' p r ocesso rs processors’ ); 

wr iteln(t ty) ; 

for i: = 0 to ma xp r ooe s so rs do begin 
wr i teln(t ty) ; 

wr i te( t ty ,i : 3 , ' ' ) ; 

for m: = 0 to maxp r ooe s so rs do 

wr ite( t ty ,e rrors[ i ,m] ; 3) ; 


end; 


(•...Prepare the reconf wore?*) 


wr i t eln ( t ty ) ; 
wr iteln(t ty) ; 

wr i t eln (t ty ,'Ho w many processors are rec onf in u red out?'); 

read(tty/iurrec); 

if nunrec>0 then beein 

writ el n(tty, 'which processors are reconfiaured out? 
for i: = 1 to numrec do begin 


); 
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1 5700 
1 5300 

15 90 0 

16 000 
1610 0 
16 200 
16 30 0 

16 400 
1 6500 
1 6600 
16 70 0 
1 6800 
16 90 0 
1 7000 

17 1 00 
17 20 0 
17 300 
17 400 
1 7500 
1 7600 
17 70 0 
« 


end; 


for i: s 1 to numrec do begin 
r ead( t ty^procr^^t); 
r etirir»[p rocret] : = 1 ; 

e nd ; 

end; 


( *MAIN PR OCED UR E* ) 
begin 

p rellm ; 

if nuno r ocsmaxp r ocea 30 rs then begin 
r ewr i te( output, file nam e) ; 
for caseno:s1 to max case do 
i nda ta; 
o ut file; 


e nd ; 

c lose{ out put ) ; 


begin 


e nd 

else wr i tel n ( t ty , ’ c hange m ^ xp r ooe sso rs , currerit 
maxp r oces so rs ) ; 


e nd, 


V a lu e is ’ , 
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PROGRAM MVTEST 
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0 0 100 

PROGRAM MVTEST; 


0 02 0 0 

C ONST M AX PR OCES30R S= 5 ; 


0 03 0 0 

max vsr 1 4 ; 


0 040 0 

VA R 


0 0500 

kount, kw:i nto^er; 


0 060 0 

r ec onf : i ntege r; 


0 crro 0 

filename; oacked array[1, .8] of char; 


0 080 0 

working: array[0. ,m axp rocea sora ] of integer; 


0 09 0 0 

MV: ARRAY[0. .MAX PR OC ESSOR S, 0 . .MAXV] OF INTEGER 

» 

0 093 0 

(*THEMV ARRAY COLUMNS WI LL BE USED 

IN E TO 

0 0960 

FORM THE DIFFERENT COM BI NA TIONS OF 

E RRDR 

0 0990 

REPORTS REQUIRED FOR THE VALUATION *) 

0 1000 

E:ARRAY[0. .MAX PR OCESSORS.O. .M AX PR OCES 33 R S] OF 

INTEGE R; 

0 10 50 

(* E IS THE ARRAY REPRESENTING EREOR 

REP ORTS * ) 

0 110 0 

A:ARRAY[0..1,0..MAXV] OF INTER R; 


0 1 15 0 

(* A IS THE ARRAY FOR MARKING WHICH PROCS REPORT ^ 

0 1 200 

PR OC ED UR E Z E RO ; 


0 13 0 

(•zerothe earray *) 


0 1 300 

VA R I, J: INTEE R; 


0 1400 

b egi n 


0 1500 

for i: = 0 to ma xp r ooea so ra do 


0 16 00 

for ,1: = 0 to maxprooeaao rs do 


0 17 00 

e[i .J]: = 0; 


0 18 0 0 

end; 


0 19 0 0 

PROCED UR E MV IN IT; 


0 19 5 0 

( * i nitial ize the mvandasaooi ate d a a* ray a • ) 


0 2000 

VA R I, J,K,L, M: INTEGER; 


0 2 10 0 

b egln 


0 2200 

for l:sO to maxp r ocea 30 ra do 


0 2300 

for m: = 0 to m*<xv do mv[l.m]; = 0; 


0 240 0 

j: = 0; 


0 25 00 

for i: = 0 to ma xp r oce a 30 ra - Ido 


0 2600 

fork:=i+1 to ma xp r ooe a ao rs do begin 


0 27 00 

m v( 1 .J ] : s 1 ; 


0 28 0 0 

m v[ k , j ] : = 1 ; 


0 29 0 0 

A[0,J]:=I : 


0 30 00 

A[1,J]:=K; 


0 310 0 

j: = j+1 ; 


0 320 0 

end; 


0 3 33 0 

for 1:=0 to maxp r oce 3 30 ra do workltR[l]: = 0; 


0 3 31 2 

( * 


0 3 3 S 

wo rk i ng[ 0 ] : = 1 ; 


0 3 337 

wo rk i 1 ] : = 1 ; 


0 3 35 0 

wo rk i ngT k w] : = 1 ; 


0 3 37 5 

• ) 


0 3 400 

END; 


0 35 00 

PROCEDURE DISP(VAR L.J:INTEER); 


0 35 50 

(* write the outpit file for uae by GEXEC •) 


0 3600 

VA R I ,K , M,S :I NTBG ER; 


0 37 0 0 

b egi n 


0 33 00 

kount: = k ou n t+ 1 ; 


0 5 0 0 

wr i t eln ( ou t put , ' p roc' , j : 2, ' outvoted; proes', 


0 4 0 0 0 

a[ 0 , 1] : 2 , a[ 1 , 1 ] : 2 , ' reporting, croc. 

0 f ailu re rep 

) . 

0 41 0 0 

for i: = 0 to raaxprocessora do write(output,workiriz[ij ); 

0 420 0 

for i: = 0 to maxp r ooea so rs do begin 


0 4 300 

m ; = 1 ; 


0 4 40 0 

s: = 0; 


0 45 0 0 

for k: = 0 to raaxprooeasora do begin 


0 4 6 0 0 

s : =e [ i . k] »m+a ; 


0 47 0 0 

m : = m *2 ; 


0 4c 0 0 

e nd : 
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0 4900 
0 50 00 
0 51 0 0 
0 52 00 
0 5203 
0 5206 
0 5209 
0 521 2 
05215 
0 5300 
0 54 0 0 
0 5 50 0 
0 560 0 
0 570 0 
0 5800 
0 5 900 
0 6000 
0 61 0 0 
0 6200 
0 62 30 
0 6260 
0 629 0 
0 62 91 
0 6292 
0 629 4 
0 629 8 
0 63 0 0 
0 64 00 
0 650 0 
0 66D 0 
0 6700 
0 680 0 
0 690 0 
0 70 00 
0 71 0 0 
072 00 
0 73 0 0 
0 74 00 
0 74 15 
0 7430 
0 74 60 
0 748 0 
0 7500 
0 760 0 
OTTO 0 
0783 0 
0 790 0 


s:=s + 259*lcount; 
wr i te< 0 u t put. «) ; 

end; 

r ec onf : =r ec onf + 256*kount; 

(•...formorethan 1 proc. out, r econf should h av »» 
constants added to it: 

1 - for one proc. out 
3 - for two p r o Gs . out 
? - for 3 orocs. out ») 
wr 1 1 eln( out put, r econf ) ; 

end; 

PR 0CEH UR E M ATCH; 

VA R I, J,K,L, M: INTEGER; 

b egin 

for 1:=0 tomaxvdo (*.. 1 is ool. of mv*) 

for j: = 0 to raaxp rooesso rs do beein 
zero; 

for i:=0 to oHxp rooesso rs do 
e[ i .j ] : =m v[i .1 ] ; 

(*mark p’ocs 0 and 1 excess disag re€ments he re*) 
e[4,0]:=1 ; 
e[5,0 1:= 1 ; 
e[ 4,2] := 1 ; 
e[5,2]:= 1 ; 
e [ 4 , 1 ] : = 1 ; 
e[5, 1 ]: = 1 : 
d i sp{ L . J ) ; 
end; 

e nd ; 

( »MAIN PROC ED UR E* ) 

B EGIN 

wr i teln ( t ty , ' 2 processors tet, enter filename* ); 
r eadln( t ty) ; 
r ead(t ty ,f 11 e nam e) ; 
rewrite(output, file nam e) ; 
k 0 u n t: = 0 ; 

wr i t eln( t ty , ' e n ter rec onf ' ); 
r ead(t ty ,r e conf ) ; 

( » 

wr 1 teln(t ty , 'which additional, proc. out?'); 
r ead( t ty , k w) ; 

• ) 

MV I NIT; 

M ATCH: 

c lose( out put ) ; 

wr i teln ( t ty f il e ccmplete'); 

E ND. 
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PROGRAM GEXEC 
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00100 

PR OCR AM 

GEXEC; 

00200 

( *Th i s 

Is the set of routines associated with the qlobal 

00300 

e xe oj t iv e * ) 

00400 



0050 0 

CONST 


00600 


MAXPR0CESS0RS=5; 

00700 


maxs ub f r ame = 50; 

00800 


t hresho 1 d=2 ; 

00900 


max bu f s=1 ; 

01000 


e r r e r r= 1 ; 

01 100 

tyft: 


01200 


proces5or = 0. .maxp rocesso rs ; 

01300 


proclnt = array[processor] of inteoer; 

01400 


0 r ocbool =a rray [ p ro ce sso r ] of boolean; 

01500 


buffer=0. .maxbufs; 

016 00 

VAR 


01700 


WORKING: PR OCBOOL; 

01800 


F RAME COUNT, CAS ENO: INTEGER; 

01900 


PREVOTE: ARRAY[ BUFFER] OF PROCINT; 

020 00 


RECONF:INTEGER; 

0 210 0 



02 20 0 



02 30 0 

PR X ED UR E BINPARS(VAR N UM : I NT EGE R) ; 

02400 

(♦procedure to represent an inteoer as a 16 bit strlnq * 

02500 

va r 


02600 


blnr: array[0,.15] of inteqer; 

027 0 0 


t num :i n t eo e r ; 

0 28 0 0 


dlvl s, 1,1 :1 nteqer; 

0 2900 


byte: packedarray[1*.2 0] of char; 

0 30 00 

b ea 1 n 


03100 


d Ivi s: =32768; 

0 320 0 


if num>6 5535 then beoln 

03300 


wrlteln(tty,»overflow^); 

0 340 0 


num:=nijm mod 6 5535; 

0 35 00 


e nd ; 

0 3600 


t num : =n um ; 

03700 


1 : = 0 ; 

0 3800 


for i: = 15 downto 0 do beoln 

03900 


if tnum dlv divis>=1 then beoln 

04000 


t num : =t num mod divls; 

0 410 0 


b 1 nr[l ] : =1 

0i)200 


end 

04300 


e Is e bl nr[ 1 ] : =0 ; 

04-!00 


d Iv i s: =d ivl s di v 2 ; 

0 45 0 0 


.t : =1 + 1 ; 

0^)600 


if blnr[l] =1 then by t e[l ] : = ,.1 . 

04700 


e Is e by t e[l 1 : = jO « ; 

0ii800 


i f (1 mod 4=0) then beoln 

04900 


1:=1+1 ; 

0 50 00 


by t e[l ] : = • ^ ; 

0 51 0 0 


end; 

0 5200 


e nd ; 

0 53 0 0 


wr 1 teln (t ty ,byt e) ; 

05400 

* 


wr 1 teln (t ty ) ; 
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05500 

e nd ; 


0 560 0 

PROCEDURE PREGEXEC; 

0 570 0 

( *Th 1 5 

orocedure copies the least s In n if leant bits of the 

05800 

error 

reporter word bits into the most sianificant positions 

05900 

after checklnq the frame number *) 

06000 



06100 

VAR 


06200 


excount: INTEGER; 

063 00 


ERR:INTEGER; 

064 00 


J,M:INTEGER; 

06500 



06 60 0 

b eq 1 n 


06700 


for .1;=0 to maxorocessors do beqln 

06800 


excount:=prevote[errerr, i] dlv 25 6; 

0 690 0 


e r r : =p rev ot e[e r r er r, 1 1 niod 2 56; 

070 00 


if ex coun t=f ran ec oj n t then 

071 00 


prevote[errerr,.t]:=2 5 7*err 

07200 


elsewrlteln(tty,.processori,,.t:3,„ ex cou nt mismatch 

073 00 


e nd ; 

0780 0 

e nd ; 


0 790 0 



08000 

PROCEDURE ALTGEXEC; 

081 00 

(♦This 

is the alternate global executive*) 

08200 



08250 

c ons t 

ma x di v = 3 2 ; 

083 00 

VAR 


08400 


R ECO NF A, DIVISOR, MULT, 3, K , L , M : I N TEGE R; 

08500 


ERCOUNT: PR (TINT; 

085 50 


LAST: INTEGER; 

08600 

b eq i n 


08700 


for ,1:=0 to maxp roces so rs do e rcoun t [.j ] : =0 ; 

0 8 80 0 


(♦. .•! nltiallze a-count* ) 

0 8 90 0 


FOR 3: = maxprocessors downto 0 do 

09000 


i f wo rk 1 nq [ i ] then beqln 

091 00 


( *. ..do for each error report*) 

0 91 50 


d iv i so r : =max dl V ; 

092 0 0 


for k:=maxprocessors dewnto 0 do beqln 

09300 


( *. ..do for each position of report" 

09400 


if i = k then last: = 0 

09500 


else 1 as t : =p rev ot e[e rr er r, .1 ] d iv divisor; 

095 50 


if odd(last) then er coun t[ k ] : =e rcou n t [ k ] +1 ; 

09 70 0 


d iv 1 so r : =d iv 1 so r div 2; 

09 80 0 


e nd 

09850 


e nd ; 

0 9 90 0 


( *. . .n ow 1 te r ec onf a* ) 

1 0000 


r econf a : =0 ; 

10100 


mu 1 1 : = 1 ; 

10 20 0 


for 1:=0 to maxprocessors do beqin 

1 0300 


if er coun t[l ] > =5 then rec on f a : =r ec onf a+mul t ; 

10 40 0 


mu 1 1 : =mu 1 1*2 ; 

10 50 0 


e nd : 

10600 


wr 1 teln (t ty , ,a It ernate reconf word,,); 

1070 0 


b i no ars ( r e conf a ) ; 

10800 

♦ 

e nd ; 
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L* * - 

1 0900 
1 10 00 
1 1100 
1 1 200 
1 1300 
1 1400 
1 1500 
1 1600 
1 1700 
1 1800 
1 1900 
1 2000 
1210 0 
12200 
1 2300 
1 2400 
1 2500 
126 00 
1 2700 
1 28 00 
1 29 00 
130 00 
1 3100 
13 200 

13 30 0 
13400 
13500 
1 3600 
1 3700 
1 3800 
13900 
14000 
14100 
1 4200 

14 30 0 

14 40 0 
14500 
1 4600 
1 4700 
14800 
14900 

15 000 
15100 
1 5200 
15300 
15400 
15 500 
15 60 0 
15 70 0 

15 80 0 
15900 

16 000 
16100 
16200 


PROCEDURE GEXECTEST ; 

(♦Global Executive Acceptance test*) 

TYFt 

ZER0_0NE=0. .1 ; 

VAR 

DIVISOR, CHECK, I, 3, SUM : INTEGER; 

FAILFLG: BOOLEAN; 

LAST_DIG: ZERO_ONE; 

b eo 1 n 

d ivl so r: =1 ; 
failflq:=false; 

for i:=0 t o ma xp r oces so rs dobeqin 

(♦...do foreach oosltlon of report*) 
(♦Implement error word shifts here*) 
s un : =0 ; 

for .l:=Otomaxorooessors dobeqin 

(*. ..do foreach error report*) 
last d iq : = (pr ev ote[e rr er r. 1 ] d iv divisor) 

*” mod 2 ; 

if (not worklnqCi]) or (l=.i) 
t hen last d iq : = 0 ; 
suii:=s(jn +Tas t_d Iq ; 

e nd ; 

check ; =r econf d iv divisor; 
if odd(check) then beain 

1 f (s un<2 ) a nd( wo rk 1 nq[ i ] ) then f a ilf 1 q : =t nj e 

e nd 

else if sum>=2 then f ailf Iq: =t lue; 
d iv i so r : =d iv i so r* 2 ; 

e nd ; 

if fallflq then altqexec 

else wr i tel n (t ty , ..q 1 oba 1 Executive OK,); 

e nd ; 

PROCEDURE INFILE ; 

(♦Read data from file Inout after main procedure has ooened it*) 
var 

c asename:p acked array[1. .40] of char; 

lntrep:proclnt; 

k: i nteqer; 

b eoi n 

re ad ln(l nput, casename) ; 

for k:=0 to maxprocessors do read(lnput,lntrep[k]); 

for k:=0 to maxprocessors do read(lnput,prevote[errerr,k]); 

readlndnput, re conf ) ; 

wr i t el n (t tv ) ; 

wr i teln (t ty ) ; 

wr i teln (t ty ,c asename) ; 

wr i te(t ty , ,Ca se, ,c aseno:3, . Enter framecount ,); 
r ead( t ty ,f r ane coun t) ; 
wr i t el n (t ty ) ; 

wr i teln d ty ,.Fa il ed or oces so rs , ) ; 
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16300 
1 6400 
16 50 0 
1 6600 
16 TOO 
16 80 0 

16 90 0 
1 7000 
17100 
17200 
17300 
17400 

17 500 
17 60 0 
17 70 0 

17 800 
1 7900 
18000 
1 81 00 
18200 
18300 
18400 
18500 
18600 

18 70 0 
18 80 0 
1 8900 
19000 
191 00 
19200 
19300 
19400 

19 500 
19600 
19700 

19 80 0 
1990 0 

20 000 


fork:=Otomaxprocessors dobeqin 
write(t ty ,1 ntrep[k] :3); 
if intrep[k]=1 then working[k]:=false 
else worklna[kl:=true; 

e nd ; 

wr i teln (t ty ) ; 

for k:=0 to maxprocessors dobeoin 

wr i teln (t ty ,»e rror reoort for pr oce s so r , , k: 3 ) ; 
blnoars{orevote[errerr, k] ); 

e nd ; 

wrlteln(tty ,»Reconflauration Wo rd. ); 
blnDars(reconf); 

end ; 

PROCEDURE PRELIMj 

(♦Initial orompts and openina of data file*) 
var filename:packed array[1. .8] of char; 
beai n 

wr 1 teln (t ty ,»G1 obal Executive Recovery Block Driver - ; 
wr 1 teln (t ty , »En t er Data File.); 
r ead In (t ty ) ; 
r e ad ( t ty ,f 1 1 e na m e ) ; 
resetd nput, fil ename) ; 

e nd ; 

(♦MAIN PROCEDURE*) 
b eol n 

p r e 1 im ; 
c aseno : = 0; 

while not eof(lnout) do beqin 
c as eno : =c aseno+ 1 ; 
inf lie; 

D r eq e xe c; 
q exe ct es t 

e nd ; 

wr i teln (t ty , ,Te st s Comolete.); 
c 1 os e( i no u t ) ; 

e nd. 
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APPENDIX C. DEMONSTRATION OF VALIDATION PROCEDURES 


This appendix contains examples of output which demonstrate the manner In which 
the fault-tolerant software for the error reporter and the global executive were 
validated* Section C.l. describes the output from DRIVER used to demonstrate 
the correctness of the error reporter, and section C*2 describes the 6EXEC 
output which showed the proper operation of the fault tolerant global executive* 


C.l* Error Reporter Validation 


Figure C.l is the output generated by the DRIVER program using data for 1 
processor out* A total of five ^frames” (I*e. test cases) are shown* The first 
line Is the abbreviated title "proc 1 exc undtctd err", which Is the designation 
for processor no* 1 having an excess number of errors undetected by the primary 
error reporter* The next 1 Ine shows the value of f ramecoujit and exco_ur}± (which 
were taken to be the same for the cases shown here). The next Item on the 
output Is a table showing the number of errors counted by the voter, the error 
reporter output (0 = no excess disagreements, 1 = excess disagreements), and the 
working status (0 = not working, 1 ■ working) for each of the six processors* 
The following line shows the Integer value of the error word Including the frame 
count encoded In the 8 most significant bits, and Immediately below It Is the 
binary representation produced by procedure BINPARS (see appendix B) . 

Because the primary error report (contained In the file) was Incorrect, the 
error reporter acceptance test Invoked the alternate, which generated an error 
report whose Integer value (not Including the frame count) Is shown on the next 
line and whose binary representation (Including the frame count) is shown 
Immediately below* 

This particular case demonstrates that the acceptance test can detect failure of 
the primary error reporter to note an excess number of disagreements In 
processor 1 when no other processors have failed and when all are working. 
Succeeding cases shown In this output demonstrate that failure of the primary 
routine to detect excess disagreements for processors 2, 3, 4, and 5 when no 
processors have been retired or have become faulty In this frame* The entire 
validation sequence described In section 2*5 consists of performing a sequence 
similar to this for processors 0 through 6 when 1, 2, or 3 additional processors 
become faulty In the current frame and when 1, 2, or 3 other processors have 
been retired. Although these validations were performed, they are not Included 
In this report for the sake of brevity. 
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e rr1 


r eadv 

5 repetitions 

proc 1 exc undt ctd e rrr 
frame no. lexecution 1 


p roces 3D r 
0 
1 
2 

3 

4 


voter error 
0 
3 
0 


0 


5 


0 


primary error words 256 
0 000 0 001 0 000 0 000 


e rro r r epo rt 
0 
0 
0 
0 
0 
0 


wo rk i rjc 
1 
1 


1 

1 


a Iterate error reporter invoked 
alternate error words 2 
0 000 0 00 1 0 0 00 0 01 0 


proc 2 exc undt ctd err 
frane no, 2execution 2 




p roces SD r 

voter e r ro r 

error report 

wo rk i ng 

0 

0 

0 

1 

1 

0 

0 ' 

1 

2 

5 

0 

■1 

1 

3 

0 

0 


4 

0 

0 

1 

5 

0 

0 

1 

primary err or words 512 
0 000 0 0 1 0 000 0 0 000 





alterate error reporter invoked 
alternate error words 4 
0 000 0 01 0 0000 0 1 00 


FIGURE CJ. Error Reporter Validation Output 



prco 3 exc undt ctd di sg r 
frame no. 3execution 3 


processor voter error error report 


0 0 

1 0 

2 0 

3 4 

^ 0 

5 0 

primary error word= 768 

0 000 0 0 1 To 000 0 000 


0 

0 

0 

0 

0 

0 


wo rk i 
1 
*} 

1 

1 

1 

1 


alterate error reporter invoked 
alternate error wo rd= 8 
0 000 0 01 1 0 0 00 1 000 


proc 4 exc undt ctd err 
frame no. 4execution 4 

processor voter error 


0 


0 


1 


0 


2 


0 


3 


0 


4 


6 


5 


0 


p r jr ry e r r cr v/o r d = 10 2 4 
0 OOC 0 10 0 00 00 0 00 0 


error report 
0 
0 
0 
0 
0 
0 


w o rk i rig 
1 
1 
1 
1 
1 


alceracc error reporter invoked 
a It e rna te er ro r wo r d= 1 6 
0 OCO 0 10 0 00 0 1 0 000 


Figure C,l, (continued) Error Reporter Validation Output 
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proc 5 exc undt ctd err 
frame no, 5execution 5 

p r oces a> r 

0 

1 

2 

3 

4 

5 

primary error words 1280 
0000 0 1 01 0 000 0 000 


e rro r re p o rt 
0 
0 
0 
0 
0 

3 0 


voter e r ro r 
0 
0 
0 
0 
0 


wo rk i ng 
1 
1 
1 

1 

i 

I 

1 


alterate error reporter invoked 
alternate error word= 32 
0 000 0 1 01 0 01 0 0000 


Figure C.1. (continued) Error Reporter Validation Output 
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C.2. Global Executive 


Figure C.2 shows an excerpt from the output generated by program GEXEC. Two 
cases are shown from an MVTEST generated file containing cases In which 
processor 1 is marked as having failed by processors 5 and 6 , and 1 additional 
processor Is marked for retirement In the reconfiguration word by the primary 
global executive. The first line shows the title of the case, I.e. "proc 0 
outvoted; procs 0, 1 reporting" . Thus, processor 0 Is marked as having excess 
disagreements by processors 0 and 1, and processor 1 Is Indicated as having 
excess disagreements In the error reports of processors 5 and 6. The second 
line of the output Is the frame count check, which, in this case Is matches the 
execution count so that PREGEXEC finds that all error reports are current. 


The following line gives the configuration of the system, and shows that no 
processors are failed (I.e. retired). The following 6 output Items are the 
binary representations of the six processor error reports. The error report 
from processor 0 Is marking Itself for retirement; the report from processor 1 
agrees. No processors are Indicated as faulty In the error reports of 
processors 2 and 3, but processors 4 and 5 Indicate that processor 1 should be 
retired. 


The next Item Is the reconfiguration word generated by the primary global 
executive. It Indicates that processor 0 should be retired, and that the 
current frame count Is 1 (In bit position 8). The global executive acceptance 
test detects an error, and Invokes the alternate routine, which marks processor 
1 for retirement as shown in the last output Item. 

This particular case demonstrates that the acceptance test can detect the error 
of simultaneously Incorrectly marking a functional processor as for retirement 
(processor 0) and not detecting a failed processor (processor 1). The second 
case shown In figure C.2 shows that processors 0, 1, 4, and 5 all Indicate that 
processor 1 should be retired, but that the primary reconfiguration word marks 
processor 0 for retirement. Once again, the recovery block can detect and 
correct the error. 


Close to 2,000 cases of this type were run, and In order to reduce the amount of 
output, GEXEC was modified to show only the case title, whether or not a 
processor which should have been retired was still generating error reports, 
whether the primary global executive output was accepted, and If not, the value 
of the alternate acceptance test was shown. Figure C.3 shows the beginning of 
such an output for failure to detect one faulty processor when one other was 
retired. The first item on the page Is the prompt generated by the modified 
GEXEC program for the data file name. The next Items show that the 
reconfiguration word Is given as 0 throughout the file (I.e. no processors are 
marked for retirement by the primary global executive in this set of test cases) 
and that processor 1 is Indicated as not working. The set of possibilities 
generated within GEXEC did not exclude processors marking themselves for 
retirement or having not working processors generating error reports. Thus, the 
first test case of figure C.3, processors 0 and 1 marking processor 0 for 
retirement. Because this condition would not lead to the retirement of 
processor 0, the primary error word Is correct. In the second case, processor 1 
Is Indicated as having excess disagreements by processors 0 and 1. Because 
processor 1 should have been retired, this is possibly a serious condition, and 
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the global executive Indicates that there may be a problem (by itself, the 
global executive can not diagnose and trace the problem) to the system In the 
message "retired processor working". In the third case, the error report from a 
retired processor along with only one other processor indicates that a third 
should be marked for retirement. This Is not a sufficiently strong case for 
retiring processor 2, so the reconfiguration word Is correct. 
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proc 0 outvoted; procs 0 1 reporting 

Case 1 Enter framecount 1 

Failed processors 
0 0 0 0 0 0 
error report for processor 0 
0000 0 00 1 0 000 0 00 1 

error report for procBSSor 1 
0 000 0 00 1 0 000 0 00 1 

error report for prooessor 2 
0 000 0 00 1 0 0 00 0 00 0 

error report for processor 3 
0 000 0 00 1 0 0 00 0 000 

error report for processor 4 
0 000 0 00 1 0 0 00 0 01 0 

error report for processor 5 
0 000 0 00 1 0 0 00 0 0 1 0 

Reconfiffuration Word' 

0 000 0001 0 000 0 00 1 

alternate reconf word 
0 000 0000 0 000 0 01 0 


Figure C,2, Global Executive Validation Output 



proc 1 outvoted; procs 0 1 reportirg 

Case 2 Enter frameccunt 2 

Failed proaessors 
0 0 0 0 0 0 
error report for prooesaor 0 
0000 001 0 0000 0 01 0 

error report for processor 1 
0 000 001 0 000 0 0 01 0 

error report for processor 2 
0 000 0 01 0 0000 0 000 

error report for processor 3 
0 000 0 01 0 0000 0 000 

error report for processor 4 
0 000 001 0 0000 0 01 0 

error report for processor 5 
0000 0010 000 0 0010 

Reconfl«uratiDn Vford 
0 000 001 1 0 000 0 00 1 

alternate rec onf word 
0 000 0000 0 000 0 01 0 


Figure C,2. (continued) Global Executive Validation Output 


70 



r Gc (Ti f A n^l •.■;n rk i m h ol fl c on s ti=» n t 
Rg c o nf iqnr n t io n I'jo rd (roconE) 

0 00 0 00 01 0 0 00 0 00 0 

Procer.^r ntotiisos; 0 workinq/ 1 failed 
0 1 0 0 0 0 


proc 0 outvoted; procs 0 1 reoortirq 
global Executive OK 

proc 1 outvoted; procs 0 1 reporting 
retired nroc. wotkinn 
global Executive OK 

proc 2 outvoted; procs 0 1 reporting 
global Executive OK ■ , .u. 

proc 3 ou booted; procs 0 1 reporting 
global Exeajjtive OK 

proc 4 outvoted; procs 0 1 reporting 
global Executive OK 

proc 5 outvoted; procs 0 1 reporting 
global Executive OK 

proc 0 outvoted; procs 0 2 reporting 
global Executive OK 

proc 1 outvoted; procs 0 2 reporting 

i. retired prole, working 

■ retired proc. working 

global Exeaitive OK 

proc 2 outvoted; procs 0 2 reporting 

' global Exeaitive OK 

proc 3 oub'oted; pros 0 2 reporting 

alternate reconf word 
0 000 0000 0000 1000 

proc 4 outvoted; pros 0 2 reporting 
alternate rieconf word 

0 00 0 0000 obol 0 000 

proc 5 outvoted; pros 0 2 reporting 
alternate reconf word 
0 000 0000 0010 0000 

proc 0 outvoted; pros 0 3 rcjportinq 
global j’XGaitive OK 

proc 1 oub/oted; pros 0 3 reporting 
retired proc. wo rk i m 
retired proc. working 
global Executive ok 

proc 2 outvoted; pros 0 3 renorting 
alternate reconf evo rd 
0 000 0000 0000 (11 00 

Figure C.3. Global Executive (VALGEX) Validation Output 
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